Daniel van Strien
banner
danielvanstrien.bsky.social
Daniel van Strien
@danielvanstrien.bsky.social
Machine Learning Librarian at @hf.co
Reposted by Daniel van Strien
Join us tomorrow for a demo of IIIF Illustration Detector!

Zoom link: iiif.io/community
Join us Feburary 11 for a demo of @danielvanstrien.bsky.social's IIIF Illustration Detector.

Zoom on the IIIF Community Calendar: iiif.io/community
February 10, 2026 at 5:22 PM
Semantic search, confidence filtering, updated weekly using Hugging Face Jobs.

Powered by a fine-tuned ModernBERT classifier. Full dataset stored in Lance format on the Hub with vector embeddings.

huggingface.co/spaces/libra...
ArXiv New ML Datasets - a Hugging Face Space by librarian-bots
This tool lets you search arXiv computer‑science papers that are predicted to present new machine‑learning datasets. Enter a keyword or use semantic search, then narrow results by research category...
huggingface.co
February 9, 2026 at 10:13 AM
Datasets and benchmarks drive AI progress, but finding papers that introduce new ones means digging through thousands of arXiv abstracts.

Updated the Dataset Papers on ArXiv app to surface them: 52K+ papers classified as introducing new datasets from 212K CS papers.
February 9, 2026 at 10:13 AM
Reposted by Daniel van Strien
Join us Feburary 11 for a demo of @danielvanstrien.bsky.social's IIIF Illustration Detector.

Zoom on the IIIF Community Calendar: iiif.io/community
February 3, 2026 at 7:45 PM
Reposted by Daniel van Strien
Built an object detector from zero-labelled data in one afternoon with help from Claude Code (it can do more than vibe code, TODO apps...)

SAM3 on HF Jobs → correct the errors → train YOLO → repeat.

Three rounds: 31% → 99% accuracy on historical index cards from @natlibscot.bsky.social
February 2, 2026 at 4:43 PM
Built an object detector from zero-labelled data in one afternoon with help from Claude Code (it can do more than vibe code, TODO apps...)

SAM3 on HF Jobs → correct the errors → train YOLO → repeat.

Three rounds: 31% → 99% accuracy on historical index cards from @natlibscot.bsky.social
February 2, 2026 at 4:43 PM
Reposted by Daniel van Strien
We used to do real science
January 12, 2026 at 1:59 AM
Thanks! Could be cool to do a few more transformer.js + IIIF demos!
January 6, 2026 at 10:09 AM
Reposted by Daniel van Strien
Built a 2.5MB image classifier that runs in the browser in an evening with Claude Code.

I used a dataset I labelled in 2022 and left on @hf.co for 3 years 😬.

It finds illustrated pages in historical books. No server. No GPU.
December 19, 2025 at 12:08 PM
cc @glenrobson.bsky.social! Finally got time to play with transformers.js and @iiif.bsky.social!
December 19, 2025 at 12:08 PM
Paste any IIIF manifest → model classifies every page locally → see where illustrations appear.

Part of small-models-for-glam: small, efficient models for cultural heritage work.

Not everything needs GPT-4!

Try it: huggingface.co/spaces/small-models-for-glam/iiif-illustration-detector
IIIF Illustration Detector - a Hugging Face Space by small-models-for-glam
Find illustrated pages in digitized historical books
huggingface.co
December 19, 2025 at 12:08 PM
Built a 2.5MB image classifier that runs in the browser in an evening with Claude Code.

I used a dataset I labelled in 2022 and left on @hf.co for 3 years 😬.

It finds illustrated pages in historical books. No server. No GPU.
December 19, 2025 at 12:08 PM
Thanks! If you haven't read it yet, you might also find arxiv.org/abs/2302.04844 interesting!
The Gradient of Generative AI Release: Methods and Considerations
As increasingly powerful generative AI systems are developed, the release method greatly varies. We propose a framework to assess six levels of access to generative AI systems: fully closed; gradual o...
arxiv.org
December 9, 2025 at 1:22 PM
Just posted my slides from the AI4LAM #FF2025 workshop on open source AI for GLAMs.

Probably slides on their own aren't that useful, but they do feature one of my growing collection of libraries-and-AI memes, so there's that danielvanstrien.xyz/slides.html
December 9, 2025 at 10:13 AM
At the AI4LAM Fantastic Futures conference this week

Happy to chat about @hf.co, open source AI for GLAMs, or why cultural heritage should bet on small, focused models over closed-source giants!

DM or find me at breaks! #AI4LAM #FF2025
December 1, 2025 at 11:13 AM
Building datasets to train smaller, task-focused models used to be incredibly time-consuming.

Very excited to see SAM3 massively lower that barrier. Describe the class you want to detect and get annotated datasets automatically!

Try it yourself: huggingface.co/datasets/uv-...!
November 21, 2025 at 1:30 PM
Very much looking forward to presenting at this tomorrow. I will be making my usual pitch that datasets are the foundational infrastructure for cultural heritage to benefit from and create useful AI models and tools.

Be warned, I did fire up the meme generator for my slides...
November 5, 2025 at 5:40 PM
Reposted by Daniel van Strien
Over the last 24 hours, I have finetuned three Qwen3-VL models (2B, 4B, and 8B) on the CATmuS dataset on @hf.co . The first version of the models are now available on the Small Models for GLAM organization with @danielvanstrien.bsky.social (Links below) Working on improving them further.
October 24, 2025 at 2:59 PM
huggingface.co/nanonets/Nan... might be worth a try for this. Can extract formulas into LaTeX
October 23, 2025 at 2:01 PM
The command (using @hf.co Jobs - serverless GPU compute)

Full script at huggingface.co/datasets/uv-...
October 22, 2025 at 7:20 PM
DeepSeek-OCR just got vLLM support 🚀

Currently processing @natlibscot.bsky.social's 27,915-page handbook collection with one command.

Processing at ~350 images/sec on A100

Using @hf.co Jobs + uv - zero setup batch OCR!

Will share final time + cost when done!
October 22, 2025 at 7:20 PM