Lightnews — Scholar-powered news

Dr. Jonathan Mall

@jonathanmall.bsky.social

1.7K followers 2.1K following 23 posts

LLMs, AI, Psychology and Speaking - Seeking to infuse technology with empathy. That's why I founded www.neuroflash.com. An AI platform driving brand-aligned marketing, and helping people connect and understand each other's perspectives.

Posts Replies Media Videos

Dr. Jonathan Mall

@jonathanmall.bsky.social

I made an interactive conversation analysis of the (world changing?) #Trump #Zelenskyy #Vance conversation. (Transcript from @apnews.com)

claude.site/artifacts/f8...

Try it yourself — What stands out to you? 👀

March 3, 2025 at 7:16 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

I analyzed the Trump-Zelensky-Vance Transcript from @apnews.com, to see what stands behind their words - speaking volumes about power dynamics. 📊

Key Insights in thread

March 2, 2025 at 11:19 AM

Reposted by Dr. Jonathan Mall

Philipp Schmid

@philschmid.bsky.social

What is better than an LLM as a Judge? Right, an Agent as a Judge! Meta created an Agent-as-a-Judge to evaluate code agents to enable intermediate feedback alongside DevAI a new benchmark of 55 realistic development tasks.

Paper: huggingface.co/papers/2410....

Paper page - Agent-as-a-Judge: Evaluate Agents with Agents

Join the discussion on this paper page

huggingface.co

December 10, 2024 at 9:53 AM

Reposted by Dr. Jonathan Mall

💙❤️💙Mia💙❤️💙

@mommamia.bsky.social

There is ZERO connection between vaccines and autism!

NONE!

December 10, 2024 at 3:00 AM

Reposted by Dr. Jonathan Mall

Peter Stefanovic

@peterstefanovic.bsky.social

Why is the government not capable wiping the floor with Nigel Farage every single day like this? Alastair Campbell effortlessly destroys him over his Brexit legacy which by almost every economic and financial measure has been disastrous for the country

December 6, 2024 at 6:41 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

OpenAIs o1-pro model is only marginally better than the o1 model. But 10x the price ($20 to $200) to get access, not sure that's worth the qual increase.

Alternatively, do your own agentic or chain of thought prompting. Or go full professional with #dspy or #adalflow for auto prompt optimization.

December 6, 2024 at 9:43 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

When predicting human sentiment responses for (german) words, Qwen 2.5 Coder 32B is doing worse than gpt4oMini.

R squared is
0.845 for GPT4oMini
0.714 for Qwen2.4 Coder 32B

Many more comparisons to follow :)

Do you know a good LLM for European Languages?

December 2, 2024 at 10:26 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

Using Perplexity instead of prompting to predict the more likely outcome of neuroscience experiments. Beating human experts by roughly 20% - wow.

Could help to make research more efficient Imho. Especially since they've older models to achieve this!

www.nature.com/articles/s41...

November 30, 2024 at 9:17 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

Menlo Ventures shows that enterprise usage of anthropic has doubled in the last year.

Could it be because it's often better than openai? - for coding it's my go to model.

November 30, 2024 at 8:40 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

A year ago, Claude 2.1 had a 200K token context window with 27% accuracy on needle-in-haystack tests. Now, Alibaba's Qwen2.5-Turbo claims a 1M token window with 100% recall on similar tests.
@alanthompson.net

November 30, 2024 at 8:33 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

SIFT introduces test-time learning by selecting highly relevant and non-redundant data for fine-tuning. Some call it Q-star 2.0. It optimizes information gain, reducing prediction uncertainty while using minimal compute - neat.

paper: arxiv.org/abs/2305.18466
Video: www.youtube.com/watch?v=vei7...

Test-Time Training on Nearest Neighbors for Large Language Models

Many recent efforts augment language models with retrieval, by adding retrieved data to the input context. For this approach to succeed, the retrieved data must be added at both training and test time...

arxiv.org

November 30, 2024 at 7:53 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

Performant and actually open LLM was just released. (with training data and recipes etc.) will be testing OLMO soon. allenai.org/olmo

November 30, 2024 at 7:50 AM

Dr. Jonathan Mall

@jonathanmall.bsky.social

Are instruct models becoming "too obedient"? When I prompt them with brand guidelines, and leave certain phrasen in, the llm will try to put those into virtually every content :D
Prompt engineering can mitigate, but its certainly different from the DaVinci days (gpt3)...old school completion.

November 30, 2024 at 7:16 AM

Reposted by Dr. Jonathan Mall

Peter W🤓

@wang.social

Happy Thanksgiving, everyone!

While you are drinking wine and drowning in turkey, I would encourage folks to think about:

What if Bluesky really is our shot at breaking the web2/socmedia stranglehold on collective sensemaking...

What kinds of trillionaire and sovereign attacks are coming?

November 28, 2024 at 2:36 PM

Dr. Jonathan Mall

@jonathanmall.bsky.social

🚀 Just ran a simulation on 1,000 word ratings using human data + our improved LLM prompt for simulating responses. Results? We explain almost 80% of the variance! This approach could revolutionize how we model human emotions—customized for specific groups. 🔥 #AI #EmotionModeling

November 28, 2024 at 12:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news