ποΈ
Parse multiple document types: PDF, DOCX, PPTX, XLSX, HTML, audio, and images.
π
Understand PDFs deeply: layout, tables, reading order, code, and formulas.
π§¬
Unified DoclingDocument format for structured output.
βͺ
Export to Markdown, HTML, DocTags, or lossless JSON.
π
Run locally for sensitive or air-gapped environments.
π€
Integrates easily with LangChain, LlamaIndex, Haystack, Langflow, and more.
π
OCR support for scanned PDFs and images.
π
Works with visual language models (SmolDocling).
π
Supports audio via automatic speech recognition (ASR).
π»
Fast and easy to use with a simple CLI.