The results were clear:
Tensorlake: 86.8% TEDS, 91.7% F1
AWS Textract: 80.7% TEDS, 88.4% F1
Azure: 78.1% TEDS, 88.1% F1
Docling: 63.8% TEDS, 68.9% F1
The gap? 670 fewer manual reviews per 10k documents.
Tensorlake: 86.8% TEDS, 91.7% F1
AWS Textract: 80.7% TEDS, 88.4% F1
Azure: 78.1% TEDS, 88.1% F1
Docling: 63.8% TEDS, 68.9% F1
The gap? 670 fewer manual reviews per 10k documents.
November 5, 2025 at 5:05 PM
Everybody can reply
Habe die ersten Versuche mit größeren Mengen von PDFs laufen lassen. Leider kommt bei Docling öfter mal die Fehlermeldung "Failed to convert", ohne dass man da eine Info bekommt, woran das liegt (Habe auf Reddit angefragt: https://www.reddit.com/r/Rag/comments/1omh3wc/docling_failed_to_convert/) […]
Original post on colearn.social
colearn.social
November 2, 2025 at 1:55 PM
Everybody can reply
FREE Online Workshop Alert!
Register Now: t.ly/FRlwZ
Learn how to automate PDF extraction & OCR using PyMuPDF & Docling — the same AI tools used in real-world data pipelines!
Trainer: Mr. Satish Gupta
Starting From: 1st November | 11:15 AM (IST)
#NareshIT #PythonWorkshop
Register Now: t.ly/FRlwZ
Learn how to automate PDF extraction & OCR using PyMuPDF & Docling — the same AI tools used in real-world data pipelines!
Trainer: Mr. Satish Gupta
Starting From: 1st November | 11:15 AM (IST)
#NareshIT #PythonWorkshop
October 30, 2025 at 11:53 AM
Everybody can reply
FREE Online Workshop Alert!
Intelligent Document Processing with Python
Register Now: t.ly/FRlwZ
Learn how to automate PDF extraction & OCR using PyMuPDF & Docling — the same AI tools used in real-world data pipelines!
Trainer: Mr. Satish Gupta
Starting From: 1st November | 11:15 AM
#AIWorkshop
Intelligent Document Processing with Python
Register Now: t.ly/FRlwZ
Learn how to automate PDF extraction & OCR using PyMuPDF & Docling — the same AI tools used in real-world data pipelines!
Trainer: Mr. Satish Gupta
Starting From: 1st November | 11:15 AM
#AIWorkshop
October 30, 2025 at 10:23 AM
Everybody can reply
How might you manage a massive global community that grows to 40k stars in one year with a small maintainer team? Docling figured out a way! Spoiler: they MAY have used @dosu-ai.bsky.social to help 😉
You can read about the whole story here! dosu.dev/blog/docling...
You can read about the whole story here! dosu.dev/blog/docling...
How IBM Research's Docling Manages Open Source Growth with Dosu
IBM's Docling scaled from 0 to 40K GitHub stars (as of October 2025) in one year with just 4 maintainers. Check out how Dosu reduced response time by 99% and automated 70% of issues.
dosu.dev
October 28, 2025 at 5:24 PM
Everybody can reply
1 reposts
1 likes
"How IBM Research's Docling Manages Open Source Growth with Dosu" dosu.dev/blog/docling...
How IBM Research's Docling Manages Open Source Growth with Dosu
IBM's Docling scaled from 0 to 40K GitHub stars (as of October 2025) in one year with just 4 maintainers. Check out how Dosu reduced response time by 99% and automated 70% of issues.
dosu.dev
October 28, 2025 at 3:23 PM
Everybody can reply
i’ve never gotten GPT5 to think this long before
October 27, 2025 at 1:26 PM
Everybody can reply
13 likes
Data Extraction (Web + Docs)
Whether you're crawling the web (Crawl4AI, FireCrawl) or parsing PDFs (LlamaParse, Docling), raw data access is non-negotiable. No context means no quality answers.
Whether you're crawling the web (Crawl4AI, FireCrawl) or parsing PDFs (LlamaParse, Docling), raw data access is non-negotiable. No context means no quality answers.
October 27, 2025 at 10:36 AM
Everybody can reply
Input raw contracts, reports, and research papers to your agents, and use Docling to turn it into structured data.
Try it out yourself, directly from Claude Desktop or via the MCP server or Python SDK.
→ github.com/docling-proj...
Try it out yourself, directly from Claude Desktop or via the MCP server or Python SDK.
→ github.com/docling-proj...
GitHub - docling-project/docling: Get your documents ready for gen AI
Get your documents ready for gen AI. Contribute to docling-project/docling development by creating an account on GitHub.
github.com
October 25, 2025 at 9:52 AM
Everybody can reply
1 likes
That’s what Docling does:
📄 Converts PDFs, Word, and HTML into clean Markdown or structured JSON.
⚡️ Runs as an MCP server so agents can parse documents in real time.
🔓 100% open-source, and already ⭐️ 42K+ GitHub stars.
📄 Converts PDFs, Word, and HTML into clean Markdown or structured JSON.
⚡️ Runs as an MCP server so agents can parse documents in real time.
🔓 100% open-source, and already ⭐️ 42K+ GitHub stars.
October 25, 2025 at 9:52 AM
Everybody can reply
1 likes
I made all of this (it's harder to make a shippable embeddable widget than I would have guessed) and it's pretty good. I'm pretty proud.
Auto crawling websites for knowledge via firecrawl.
Upload any file type you can imagine that's compatible with docling
agno as the AI framework.
SAID
Auto crawling websites for knowledge via firecrawl.
Upload any file type you can imagine that's compatible with docling
agno as the AI framework.
SAID
October 23, 2025 at 10:01 PM
Everybody can reply
1 likes
#IBM has launched #Granite 4.0, the next generation of open-source, small but efficient, IBM language models, together with Granite-#Docling, the next gen document format converter.
On IProgrammer: cutt.ly/xr4u6ZQs
#llm #gpt #chatgpt
On IProgrammer: cutt.ly/xr4u6ZQs
#llm #gpt #chatgpt
IBM Launches Granite Version 4.0 and Granite-Docling
Programming book reviews, programming tutorials,programming news, C#, Ruby, Python,C, C++, PHP, Visual Basic, Computer book reviews, computer history, programming history, joomla, theory, spreadsheets...
cutt.ly
October 23, 2025 at 6:02 PM
Everybody can reply
2 likes
Building Intelligent Document Processing with Apache Camel: Docling meets LangChain4j
https://camel.apache.org/blog/2025/10/camel-docling/
#ai #llm #genai
https://camel.apache.org/blog/2025/10/camel-docling/
#ai #llm #genai
Building Intelligent Document Processing with Apache Camel: Docling meets LangChain4j
Summary: Building intelligent document processing pipelines with Docling, LangChain4j, and Camel YAML DSL
camel.apache.org
October 20, 2025 at 9:00 PM
Everybody can reply
1 quotes
1 likes
Docling is also fantastic for #DigitalHumanities. It plugs in to spaCy which is great for natural language processing (NLP): spacy.io/universe/pro...
spacy-layout · spaCy Universe
Process PDFs, Word documents and more with spaCy
spacy.io
October 20, 2025 at 10:55 AM
Everybody can reply
Previously I built a custom data processing pipeline using object character recognition (OCR) with Tesseract to extract data from thousands of scanned PDFs...the most closed open data. Details here: datastories.maynoothuniversity.ie?p=674 Docling now saves so much time and code in data preparation.
CATU Eviction Nation report launched | Data Stories
datastories.maynoothuniversity.ie
October 20, 2025 at 10:55 AM
Everybody can reply
There tends to be a knee jerk response to posts mentioning GenAI on Bluesky. That's disappointing because Docling is a double-edged sword...perfect for digital #counter-practice. I use it for liberating data that government agencies have inconveniently hidden in open sight within thousands of PDFs.
October 20, 2025 at 10:55 AM
Everybody can reply
If you are working on structured data extraction and OCR with tricky documents like PDFs I strongly recommend checking out Docling. Good presentation via the Chaos Computer Club:
media.ccc.de/v/sps25-5645...
media.ccc.de/v/sps25-5645...
Docling: Get your documents ready for generative AI
Docling is an open-source Python package that simplifies document processing by parsing diverse formats — including advanced PDF understa...
media.ccc.de
October 20, 2025 at 9:08 AM
Everybody can reply
October 19, 2025 at 1:39 PM
Everybody can reply
1 likes
GitHub - docling-project/docling: Get your documents ready for gen AI
GitHub - docling-project/docling: Get your documents ready for gen AI
github.com
October 15, 2025 at 11:13 AM
Everybody can reply
@SnoopJ yeah im also trying to grok the differences- i just pip installed docling and it installed a buncha cuda libraries which spooked me into thinking that its gonna require too much of my precious system resources.
October 15, 2025 at 5:43 AM
Everybody can reply
There are lots of Python packages for PDF extraction - pymupdf4llm, docling, markitdown - and more.
I've mostly used pymupdf, but am curious to hear from folks who have used the other options. How'd you like em?
I've mostly used pymupdf, but am curious to hear from folks who have used the other options. How'd you like em?
October 15, 2025 at 5:36 AM
Everybody can reply
1 likes
There are lots of Python packages for PDF extraction - pymupdf4llm, docling, markitdown - and more.
I've mostly used pymupdf, but am curious to hear from folks who have used the other options. How'd you like em?
I've mostly used pymupdf, but am curious to hear from folks who have used the other options. How'd you like em?
October 15, 2025 at 5:36 AM
Everybody can reply
1 reposts
Как преобразовать PDF-документы в структурированные данные для искусственного интеллекта с помощью Docling
https://kripta.biz/posts/C70DEC7F-DA39-4AEB-9682-5C7B330256DB
https://kripta.biz/posts/C70DEC7F-DA39-4AEB-9682-5C7B330256DB
October 14, 2025 at 9:21 AM
Everybody can reply
October 14, 2025 at 9:20 AM
Everybody can reply