Lightnews — Scholar-powered news

joelniklaus.bsky.social

@joelniklaus.bsky.social

You know LLMs have become mainstream when Mark Manson teaches you prompt engineering 😉

youtu.be/AUAHkhOldx8

Prompt PDF here: https://markmanson.net/aipromptsIn this video, I put AI to the test. Not as a productivity hack, but as a personal growth tool. Everyone’s u...

www.youtube.com

November 13, 2025 at 4:01 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Cool analogy regarding training on the test task

November 12, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

pleias just released 75B tokens of synthetic data upsampled from 50K vital Wikipedia articles!

Some thoughts below:
- Interesting that they use such a deep architecture for such small models (64 layers for 56M and 80 layers for 321M parameters)

November 11, 2025 at 3:59 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

What does AGI actually mean? A who's who in AI spent 57 pages answering that.

TLDR: AGI is defined through ten measurable cognitive domains using psychometric theory.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Need copyright-clean training data at scale? Check out the gold mine of the KL3M Data Project on the Hugging Face Hub! ALEA Institute provides 132+ million documents from 16 sources with substantial training resources:

November 8, 2025 at 3:01 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

If you're interested in legal retrieval, check out the amazing Massive Legal Embedding Benchmark (MLEB) by Isaacus!

Very cool collection of retrieval datasets all available on the Hugging Face hub!

Great work by Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec!

November 7, 2025 at 4:02 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

CourtListener supports semantic search now!

Apparently they implemented hybrid search using their own fine-tuned ModernBERT model publicly available on the Hugging Face hub!

Congrats to @michaeljaylissner and the Free Law Project for making this happen!

November 6, 2025 at 4:04 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

On-policy distillation matches RL performance at 2-10% of compute cost.

RL gives sparse feedback and burns compute. Off-policy distillation is efficient but learns in the teacher's states, not the student's, causing compounding errors on long sequences.

November 5, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

The correlation between number of reads and edits across Wikipedia articles is 0! This means there is a significant number of articles that are highly read but almost never edited.

November 4, 2025 at 3:57 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

If you're exploring computational legal research or building legal AI systems, take a look at Jurisprudence on the Hugging Face Hub by Antoine Jeannot.

November 3, 2025 at 4:04 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Reasoning models excel at math but struggle with simple requests like word limits during thinking

TLDR: Models ignore user instructions while reasoning despite following them in final outputs.

November 2, 2025 at 3:04 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Cool long-context eval by Artificial Analysis!

AA-LCR is a set of 100 tough questions where you need to piece together answers from several real-world documents—sometimes really big ones—so you can’t just copy and paste the answers.

November 1, 2025 at 3:02 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Very cool work by researchers from Massachusetts Institute of Technology.

October 31, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Check out the Hugging Face inference endpoints, quick and simple inference for many great open models!

October 30, 2025 at 4:03 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Hugging Face just got promoted to the BigTech club 😉

Thanks to Gian Sbetta and Edouard Treccani for inviting me to a great first AI Builders event in Zurich this evening!

Had lots of great conversations with super interesting people!

October 29, 2025 at 9:14 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Seeking students & open-source contributors to join legal AI projects at Hugging Face. If you’re curious about ML × legal tech, this could be a great way to learn + contribute. The legal domain is very rich in hard natural language problems from large scale retrieval to hallucinations.

October 29, 2025 at 3:58 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Impressive collection of specialized models and datasets for French taxation and legal documents by Louis Brulé Naudet:

October 28, 2025 at 3:58 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

I just evaluated MiniMax M2 on GPQA-Diamond and LEXam-English in the "I don't know" setup.

TLDR: It is very strong on GPQA, especially for its size, but underperforms on LEXam.

October 27, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Just finished reading "The Ultra-Scale Playbook: Training LLMs on GPU Clusters". Great coverage of the important concepts with good explanations and nice interactive graphics!

Thanks Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf!

October 26, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

LEXam Update: GPT-5 Takes the Top Spot

We're excited to share our latest LEXam evaluation results:
- GPT-5 claims the #1 position, outperforming Gemini 2.5 Pro and setting a new state-of-the-art for legal reasoning on LEXam!

October 23, 2025 at 3:01 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

GPT-5 and Claude can ace GPQA Diamond, but LEXam (a legal reasoning benchmark) exposes a critical flaw: they'd rather be confidently wrong than admit uncertainty.

⚙️ The Setup
I evaluated ten frontier models on LEXam (English MC subset) using an "I don't know" (IDK) protocol.

October 22, 2025 at 3:04 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Stop what you are doing and try out GEPA now!

"GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" presents such elegant ideas by a collection of amazing researchers!

Here is a tldr of how it works:

October 21, 2025 at 3:03 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

What's special about October 27th 2025?

Yoshua Bengio, the most-cited computer scientist in the world, is 1 week away from becoming the first ML researcher to hit 1 million citations! 🤯

At his current rate of 366 citations/day, he'll reach this unprecedented milestone around October 27th 🎯

October 20, 2025 at 3:04 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Very cool and detailed Stanford University and Carnegie Mellon University study on sycophancy: "Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence"

Sycophancy, the phenomenon of excessively agreeing with or flattering users, is a pervasive issue in current LLMs.

Findings:

October 19, 2025 at 3:00 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Super cool new LLM system by Alex Zhang and Omar Khattab!

October 16, 2025 at 2:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news