Docling Preps Your Files for GenAI, RAG, and Beyond

Docling Hero Image
Transform Your Documents
Docling turns messy PDFs, DOCX, and slides into clean, structured dataβ€”ready for RAG, GenAI apps, or anything downstream. Complex layouts? Tables? Formulas? It handles them, so you don’t have to.
Advanced Document Parsing

Extracts clean structure from messy PDFs, DOCs, HTML, and more.

GenAI-Ready Integration

Plugs into LangChain, LlamaIndex, and other popular AI frameworks.

Structured Output

Delivers chunked, labeled data optimized for LLM pipelines.

Features
πŸ—‚οΈ

Parse multiple document types: PDF, DOCX, PPTX, XLSX, HTML, audio, and images.

πŸ“‘

Understand PDFs deeply: layout, tables, reading order, code, and formulas.

🧬

Unified DoclingDocument format for structured output.

β†ͺ

Export to Markdown, HTML, DocTags, or lossless JSON.

πŸ”’

Run locally for sensitive or air-gapped environments.

πŸ€–

Integrates easily with LangChain, LlamaIndex, Haystack, Langflow, and more.

πŸ”

OCR support for scanned PDFs and images.

πŸ‘“

Works with visual language models (SmolDocling).

πŸŽ™

Supports audio via automatic speech recognition (ASR).

πŸ’»

Fast and easy to use with a simple CLI.

Live Assistant

Want to harness the power of AI with live support on Docling? Try Chat with Dosu, powered by our friends at Dosu. Chat Now β†’

live image