We tested every major parser on real enterprise documents.
The results will change how you think about OCR accuracy 🧵
Not that 1926 Norwegian statistical tables is a generally useful benchmark…
We tested every major parser on real enterprise documents.
The results will change how you think about OCR accuracy 🧵
Not that 1926 Norwegian statistical tables is a generally useful benchmark…
We tested every major parser on real enterprise documents.
The results will change how you think about OCR accuracy 🧵
We tested every major parser on real enterprise documents.
The results will change how you think about OCR accuracy 🧵
In the free Qdrant Essentials Course, learn how to:
- Architect vector-powered data lakes
- Optimize ETL pipelines
- Create knowledge graphs
- Integrate @langchain.bsky.social agents for natural language queries
t.co/OoPZswrL7z
In the free Qdrant Essentials Course, learn how to:
- Architect vector-powered data lakes
- Optimize ETL pipelines
- Create knowledge graphs
- Integrate @langchain.bsky.social agents for natural language queries
t.co/OoPZswrL7z
We're using VLMs for:
- Page classification in large documents
- Table/figure summarization
- Fast structured extraction (skip_ocr mode)
Here's what this means for document processing 🧵
We're using VLMs for:
- Page classification in large documents
- Table/figure summarization
- Fast structured extraction (skip_ocr mode)
Here's what this means for document processing 🧵
You might be a great fit if you like working with Rust, Python, K8s, me?, and you enjoy building products for developers.
You might be a great fit if you like working with Rust, Python, K8s, me?, and you enjoy building products for developers.
That means:
❌ Lost audit trails
❌ Manual review of revision history
❌ No programmatic access to reviewer comments
❌ Workflows that can't route based on specific edits
That means:
❌ Lost audit trails
❌ Manual review of revision history
❌ No programmatic access to reviewer comments
❌ Workflows that can't route based on specific edits
Section 2.2 becomes a top-level header (##) instead of nested (###).
We just shipped automatic header correction.
🧵 How it works:
Section 2.2 becomes a top-level header (##) instead of nested (###).
We just shipped automatic header correction.
🧵 How it works:
When users ask "where did this come from?" your system should point to the exact page fragment...not just "file_name.pdf".
Built citation-aware RAG with spatial metadata has:
→ Parse docs with bounding boxes
→ Embed citation anchors in chunks
→ Return page numbers + coordinates
A 🧵
When users ask "where did this come from?" your system should point to the exact page fragment...not just "file_name.pdf".
Built citation-aware RAG with spatial metadata has:
→ Parse docs with bounding boxes
→ Embed citation anchors in chunks
→ Return page numbers + coordinates
A 🧵
We have a few open positions if you’d like to work with us: www.linkedin.com/jobs/search/...
We have a few open positions if you’d like to work with us: www.linkedin.com/jobs/search/...
But flatten that data like most parsers do and trust is lost.
Tensorlake restores trust by preserving structure, generating summaries for effective embeddings, and attaching evidence via b-boxes.
But flatten that data like most parsers do and trust is lost.
Tensorlake restores trust by preserving structure, generating summaries for effective embeddings, and attaching evidence via b-boxes.
Every answer should come with receipts (citations + context).
Learn how to make your AI correct and verifiable in this month’s Document Digest newsletter 👇
Every answer should come with receipts (citations + context).
Learn how to make your AI correct and verifiable in this month’s Document Digest newsletter 👇
This is the beginning of better integration with Microsoft Azure and Tensorlake.
If you are using Azure, and need better Document Ingestion and ETL for unstructured data reach out to us!
This is the beginning of better integration with Microsoft Azure and Tensorlake.
If you are using Azure, and need better Document Ingestion and ETL for unstructured data reach out to us!
Get citations for every field extracted with Tensorlake.
Read the blog and try our citations with the example notebooks: tlake.link/blog/citations
Get citations for every field extracted with Tensorlake.
Read the blog and try our citations with the example notebooks: tlake.link/blog/citations
What’s dead is cosine‑N without a retrieval plan.
We ship advanced RAG...out of the box:
• Classify pages → target sections
• Extract structured fields → filter by form_type, fiscal_period
• Verify data; cite page/bbox
Want to know how? 🧵👇
What’s dead is cosine‑N without a retrieval plan.
We ship advanced RAG...out of the box:
• Classify pages → target sections
• Extract structured fields → filter by form_type, fiscal_period
• Verify data; cite page/bbox
Want to know how? 🧵👇
🧠 LangGraph (by @langchain.bsky.social)
+ 📝 Tensorlake Contextual Signature Detection =
✅ Knows who signed
✅ When they signed
✅ If it’s ready to close
Full tutorial + code linked below 👇
🧠 LangGraph (by @langchain.bsky.social)
+ 📝 Tensorlake Contextual Signature Detection =
✅ Knows who signed
✅ When they signed
✅ If it’s ready to close
Full tutorial + code linked below 👇
multiple columns, fragmented text blocks, mixed reading order
Tensorlake doesn't.
✅ Authors parsed as one clean chunk
✅ Abstract follows, exactly as it should
Unstructured ≠ unordered
Preserve reading order. Parse with Tensorlake.
multiple columns, fragmented text blocks, mixed reading order
Tensorlake doesn't.
✅ Authors parsed as one clean chunk
✅ Abstract follows, exactly as it should
Unstructured ≠ unordered
Preserve reading order. Parse with Tensorlake.
They are finally live 🥳
More announcements around this is coming soon, but if you didn't see the announcement in our Slack, make sure you use v2 API and SDK 0.2.20 🙌
They are finally live 🥳
More announcements around this is coming soon, but if you didn't see the announcement in our Slack, make sure you use v2 API and SDK 0.2.20 🙌
Check out this quick demo or try it out in the Colab Notebook (linked in the comments)
Check out this quick demo or try it out in the Colab Notebook (linked in the comments)
A new @langchain.bsky.social tool to parse real-world documents (PDFs, scans, forms) with Tensorlake & feed structured data right into your agents.
Built for devs wrangling docs in legal, finance, healthcare & more.
Learn more: tlake.link/langchain-tool
A new @langchain.bsky.social tool to parse real-world documents (PDFs, scans, forms) with Tensorlake & feed structured data right into your agents.
Built for devs wrangling docs in legal, finance, healthcare & more.
Learn more: tlake.link/langchain-tool
🧠 LangGraph (by @langchain.bsky.social)
+ 📝 Tensorlake Contextual Signature Detection =
✅ Knows who signed
✅ When they signed
✅ If it’s ready to close
Full tutorial + code linked below 👇
🧠 LangGraph (by @langchain.bsky.social)
+ 📝 Tensorlake Contextual Signature Detection =
✅ Knows who signed
✅ When they signed
✅ If it’s ready to close
Full tutorial + code linked below 👇
It might delay a claim or void a contract
Now in Tensorlake: Contextual Signature Detection
→ Detect handwritten, typed, or image-based
→ Trigger routing, alerts, or human review
→ API + SDK + Playground
Read the full blog post
tlake.link/signature-de...
It might delay a claim or void a contract
Now in Tensorlake: Contextual Signature Detection
→ Detect handwritten, typed, or image-based
→ Trigger routing, alerts, or human review
→ API + SDK + Playground
Read the full blog post
tlake.link/signature-de...
Huge thanks to everyone who supported, upvoted, and shared 💚
Tensorlake is just getting started. Stay tuned - there’s so much more to come.
P.S. There's still time to upvote our launch and let us know your thoughts 👇
#AI #RAG #LLM #devtools
Huge thanks to everyone supporting Tensorlake 🎉
From devs wrangling PDFs to teams automating high-stakes workflows.
If you haven’t yet, check us out 👇
Huge thanks to everyone supporting Tensorlake 🎉
From devs wrangling PDFs to teams automating high-stakes workflows.
If you haven’t yet, check us out 👇
If you’ve dealt with messy document workflows and trying to parse complex documents (insurance claims, financial docs, multi-page forms), this is for you.
Would love your support 💚
www.producthunt.com/products/te...
If you’ve dealt with messy document workflows and trying to parse complex documents (insurance claims, financial docs, multi-page forms), this is for you.
Would love your support 💚
www.producthunt.com/products/te...