Lightnews — Scholar-powered news

anishathalye.bsky.social

@anishathalye.bsky.social

If you're interested in semantic data processing, you can also check out related systems like DocETL from Shreya Shankar et al., LOTUS from Liana Patel et al., and Palimpzest from Chunwei Liu et al. (4/4)

September 11, 2025 at 3:38 PM

anishathalye.bsky.social

@anishathalye.bsky.social

I've just open sourced Semlib: github.com/anishathalye.... (3/)

GitHub - anishathalye/semlib: Build data processing and data analysis pipelines that leverage the power of LLMs 🧠

Build data processing and data analysis pipelines that leverage the power of LLMs 🧠 - anishathalye/semlib

github.com

September 11, 2025 at 3:36 PM

anishathalye.bsky.social

@anishathalye.bsky.social

These operators are implemented in Semlib, a new library I built to help solve a class of semantic data processing problems that is underserved by current tools such as agents and conversational chatbots.

More on the story and use cases here: anishathalye.com/semlib/. (2/)

Semlib: LLM-powered Data Processing

Semlib is a Python library for building data processing and data analysis pipelines that leverage the power of large language models (LLMs).

anishathalye.com

September 11, 2025 at 3:36 PM

anishathalye.bsky.social

@anishathalye.bsky.social

If you have suggestions for topics to cover in the next iteration of the course, please share them in this thread!

August 5, 2025 at 5:43 PM

anishathalye.bsky.social

@anishathalye.bsky.social

Lecture videos: www.youtube.com/@MissingSeme..., Notes: missing.csail.mit.edu

Missing Semester

Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered, and is instead left to students to figure out...

www.youtube.com

August 5, 2025 at 5:43 PM

anishathalye.bsky.social

@anishathalye.bsky.social

Incidentally, this is how I first got interested in ML. github.com/anishathalye...

GitHub - anishathalye/neural-style: Neural style in TensorFlow! 🎨

Neural style in TensorFlow! 🎨. Contribute to anishathalye/neural-style development by creating an account on GitHub.

github.com

June 21, 2025 at 3:19 PM

anishathalye.bsky.social

@anishathalye.bsky.social

Code/binary here: github.com/anishathalye...

GitHub - anishathalye/lumen: Magic auto brightness based on screen contents 💡

Magic auto brightness based on screen contents 💡. Contribute to anishathalye/lumen development by creating an account on GitHub.

github.com

June 17, 2025 at 5:07 PM

anishathalye.bsky.social

@anishathalye.bsky.social

We did a workshop at AIUC that: (1) implements a RAG app on top of Cursor's docs, (2) reproduces the widely-publicized failure from last week, and (3) shows how to automatically catch and reproduce this failure. All slides/code are open-sourced here: github.com/cleanlab/aiu... (5/5)

GitHub - cleanlab/aiuc-workshop: AI User Conference 2025 - Developer Day workshop

AI User Conference 2025 - Developer Day workshop - GitHub - cleanlab/aiuc-workshop: AI User Conference 2025 - Developer Day workshop

github.com

April 24, 2025 at 6:21 PM

anishathalye.bsky.social

@anishathalye.bsky.social

What’s the solution? I believe that one ingredient will be intelligent systems that evaluate the output of these LLMs in real-time and keep them in check, building on and combining techniques like LLM-as-a-judge, using per-token logprobs, and statistical methods. (4/5)

April 24, 2025 at 6:21 PM

anishathalye.bsky.social

@anishathalye.bsky.social

Why do such failures occur? These next-token-prediction models are nondeterministic and can be fragile. And they’re not getting consistently better over time—OpenAI’s latest models like o3 and o4-mini show higher hallucination rates compared to previous versions. (3/5)

April 24, 2025 at 6:21 PM

anishathalye.bsky.social

@anishathalye.bsky.social

It’s been over a year since the well-publicized failures of Air Canada’s support bot and NYC’s MyCity bot. And these AI’s are still failing spectacularly in production, with the most recent debacle being Cursor’s AI going rogue and triggering a wave of cancellations. (2/5)

April 24, 2025 at 6:21 PM

anishathalye.bsky.social

@anishathalye.bsky.social

I wonder if there's anything special in the Cursor Tab completion model or system prompt that induces this behavior.

April 16, 2025 at 10:04 PM

anishathalye.bsky.social

@anishathalye.bsky.social

2/2
It works surprisingly well in practice.

cleanlab.ai/blog/rag-eva...

Hoping to see more of these real-time reference-free evaluations to give end users more confidence in the outputs of AI applications.