Joe Barrow
@jbarrow.bsky.social
100 followers 190 following 22 posts
NLP @ Pattern Data Prev: Adobe Research, PhD UMD
Posts Media Videos Starter Packs
jbarrow.bsky.social
Now, some acknowledgments: this work was made possible thanks to a generous compute grant from Lambda!

And I've got a hosted version of the model that I'll be sharing in a couple days hosted on @modal-labs.bsky.social, which makes it basically free for me to host and scale
jbarrow.bsky.social
Now, just because we filtered for the cleanest forms doesn't mean we got _perfect_ forms. There are still a lot of inconsistencies in how people prepare forms! In future work I'll be looking at mitigating data quality issues like these.
jbarrow.bsky.social
(Note, this doesn't _just_ apply to Acrobat, it's also better than Apple Preview -- neither Acrobat nor Preview even make an attempt at checkboxes, and they're often fooled by any straight, horizontal line. Left: Acrobat, Right: FFDNet)
jbarrow.bsky.social
If we train object detectors to find the form fields on these pages, we get a much cleaner set of forms than if you used Acrobat to automatically prepare your form. (Left: Acrobat, Right: FFDNet).
jbarrow.bsky.social
Step 1 is to filter out for the cleanest forms possible. We start with 8MM PDFs from Common Crawl, and work our way down to ~60k of the cleanest forms we can find. The results is a ~500k page dataset, called CommonForms.
jbarrow.bsky.social
Paper thread of some work I’m *incredibly* proud of, my first single author paper!

Converting a PDF to a fillable form is a hard problem, and a lot of solutions don’t work very well! In CommonForms, I show that you can train models that outperform Adobe Acrobat for <$500! 🧵
jbarrow.bsky.social
Yeah I wonder if that statistic is flipped between the cities (though operated by the same provider — Lyft — I assume?)

No way that 99 out of every 100 riders in Boston have visited more than 27 stations?
jbarrow.bsky.social
Pretty sure you want that number to be lower. :p (my stats for DC ridership)
jbarrow.bsky.social
Would absolutely love that!
jbarrow.bsky.social
"AI TOPS our stock price"
- Nvidia, today
jbarrow.bsky.social
Ah, yes, that ol' familiar unit of measure "AI TOPS"
jbarrow.bsky.social
Agree, my ideal would be if you could type into an old, cheap, refurb kindle personally.

Here’s a video of a person typing into the Palma: www.reddit.com/r/Onyx_Boox/...

My experience (tablet) is that it’s maybe 10s from pickup to writing — wake up (3s), navigate to apps (2s), open app (3-5s)?
Palma as a FreeWrite
www.reddit.com
jbarrow.bsky.social
I’ve got one of the older, larger eInk tablets and use it for reading books/papers and taking notes. Battery after several years lasts about a week of average use, longer if I keep WiFi off.
jbarrow.bsky.social
Not necessarily hitting the price point but there are eInk mini tablets (e.g. Boox Palma at around $200) that have Android, no sim (so no phone distractions), and long battery life (thanks to the eInk and being generally underpowered). They accept keyboards, too.
Reposted by Joe Barrow
jbarrow.bsky.social
Holy moly that created an extra half page of space!
jbarrow.bsky.social
Gemini 2.0 Flash is pretty good at localization in images. for an LMM (much better than GPT-4o in my experiments).
A picture of a teapot to the right of a teacup, both in a flat-bottomed basket. The teacup has eta and flowers in it. There are 2 blue bounding boxes on the image, one labeled "Teapot" and one labeled "Teacup" that are over the teapot and teacup respectively.
jbarrow.bsky.social
ML history question: is there an earlier reference to pixel-only in-context (i.e. no fine-tuning) DocVQA performance than the GPT-4 announcement from OpenAI?
jbarrow.bsky.social
Aged white tea, the kind that comes in a compressed disk or ball. My favorite kind of tea, imo they taste naturally quite sweet. Yunnan Sourcing has a bunch!