apoorva lal
banner
apoorvalal.com
apoorva lal
@apoorvalal.com
causal inference, econometrics, ML, arsenal, loud music, unix, FOSS for scientific computing.

apoorvalal.github.io

(passively) maintains @paperposterbot.bsky.social
gist.github.com/apoorvalal/6... here are some test scripts and benchmarks comparing CPU w GPU for matrix multiplication, least squares, MLE, nnet operations. 50x+ speedups abound.
test_attention.py
GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
November 15, 2025 at 8:54 PM
self-explanatory ones
November 15, 2025 at 7:51 PM
Yeah doing this by hand in 2025 is definitely vehemently ignoring the bitter lesson but it's vaguely artisanal and i think it's fun lol. Let me think about the labelling problem; agree in principle
November 15, 2025 at 7:40 PM
Tbd, really; i can already see some predictable ocr bleed (ad copy entering text) that can likely be postprocessed away. I'll text you once this run is done with some examples.

Still this surya library is miles ahead of layoutparser for this kind of text data.
November 15, 2025 at 7:29 PM
(pixar didn't do very much)
November 14, 2025 at 8:42 PM
here's a caveman linux searcher that works well. assuming your search service is on its last legs, you could write a little script that
0) pulls and extracts these zips into a folder
1) execute ./seeker.sh files/
2) regex search, fzf navigation, open in vim
gist.github.com/apoorvalal/2...
November 13, 2025 at 4:40 AM
how big is that google drive folder? it's been zipping for like 20 mins for me
November 13, 2025 at 3:47 AM
google scholar pdf reader's citation popups are sending me down interesting rabbit-holes. good job goog.
November 4, 2025 at 7:43 PM
yeah i suspect they'll be quite parsimonious actually, you could even try a few simple options in spacy and spot-check performance before going for a big model off huggingface. unclear if an LLM running on top will add value but reasonable people seem to disagree with me and put everything in chat
November 3, 2025 at 10:08 PM
was going to apologise for giving the gift horse a dental examination but since your other pasttime is a genuinely masochistic game i think debugging cuda issues is preferable bsky.app/profile/paul...
F Bilewater
important PSA for everyone still struggling their way through silksong: you are missing a bench in bilewater

www.ign.com/articles/hol...
November 3, 2025 at 9:11 PM
assuming you went ahead with postgres, i think you could generate embeddings with some reasonable model (openai, or huggingface.co/nomic-ai/nom...), insert it into your db, and then use pgvector to do nearest neighbours instead of a string search?
github.com/pgvector/pgv...

would be cool to have
GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres
Open-source vector similarity search for Postgres. Contribute to pgvector/pgvector development by creating an account on GitHub.
github.com
November 3, 2025 at 9:07 PM
applied econometrics basically restricts itself to this by sticking to stacking heaps of least squares and (with a few exceptions) has basically missed the boat on the last 40 years of computational advancements.
November 2, 2025 at 5:21 PM
Lol to stretch the metaphor beyond breaking point: twine is used to move the wheels? Cut a chunk off a wheel of cheese?
twine.readthedocs.io/en/stable/
November 1, 2025 at 4:38 PM
github.com/py-econometr... i've had to explain why the regression library that goes fast because of ducks has this logo
GitHub - py-econometrics/duckreg: Every big regression is a small regression with weights.
Every big regression is a small regression with weights. - GitHub - py-econometrics/duckreg: Every big regression is a small regression with weights.
github.com
October 31, 2025 at 5:55 PM
I use "scikit-learning" (complimentary)
October 31, 2025 at 1:49 PM
not sure why What we do in the shadows hasn't crobarred in a set-piece here
October 30, 2025 at 6:10 PM