Lightnews — Scholar-powered news

Martin Elstner

@martin.elstner.dev

Coolcool. Hab so ein Brother Hobby Teil, da operiert man mit einem 5mL Tropfer

November 10, 2025 at 10:37 AM

Martin Elstner

@martin.elstner.dev

Die braucht aber nicht den ganzen Beutel?

November 10, 2025 at 9:51 AM

Martin Elstner

@martin.elstner.dev

Yeah in a 1:1 qwen is typically stronger. Granite is best for “non-chinese, permissive license” environments

October 29, 2025 at 7:17 AM

Martin Elstner

@martin.elstner.dev

The model family in is entirety is really good. As many businesses don’t like to build on qwen base models (smth along the lines “Chinese model will steal our data), they are our goto starting point for local deployment now

October 29, 2025 at 7:10 AM

Martin Elstner

@martin.elstner.dev

They excluded the exact prompts, as this is probably relevant IP but you’ adapt them to your specific use case anyway. All in all, a great find!

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

and an open source implementation of the SSR pattern on GitHub: github.com/pymc-labs/se...

GitHub - pymc-labs/semantic-similarity-rating: Implementation of the SSR algorithm of the paper "LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings"

Implementation of the SSR algorithm of the paper "LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings" - pymc-labs/semantic-similarity-rating

github.com

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

an arXiv paper: arxiv.org/html/2510.08...

LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings

arxiv.org

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

They provide a nice blog post: www.pymc-labs.com/blog-posts/A...,

AI-based Customer Research: Faster & Cheaper Surveys with Synthetic Consumers

www.pymc-labs.com

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

This works much better than the direct approach, but still less effective than the proposed SSR technique.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

They also found that directly prompting for a Likert-scaled candidate evaluation results in poor reliability and an over-sampling of ‚unsure‘ votes. Interestingly, they also tested a second LLM call to map generated written responses onto a defined scale.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

and calculating the vector similarity to a set of pre-defined reference answers (semantic similarity ranking, SSR). They worked with Colgate and could test against a large set of real consumer surveys and found very good correlation between the LLM-generated answers and observed consumer behaviour.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

This worked for all our previous use cases but we had to put quite a bit of effort into the classification part.
PyMC Labs (@pymc-labs.bsky.social) published a study that addresses this challenge by embedding the open-ended model answer

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

But you are still facing the problem of getting your numeric score. In the past, we worked with clustering, topic modeling and classic ML techniques to map these answers onto defined categories.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

Models tend to average everything out and will produce ‚not sure‘ answers in most cases. If you ask for written explanations (prompting like ‚explain in three sentences why you would like to buy the presented product‘) in an open-ended fashion, you can trigger much more valuable results.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

Conventional in-person surveys typically measure that on a Likert scale (ranging from ‚definitely no’ to ‚definitely yes‘). The obvious idea here is to ask the LLM for that score directly, but unfortunately this doesn’t work.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

First, define a persona and let the model role play this persona. So we need some demographic parameters to pack into a prompt to construct our synthetic consumer. Second, we need to quantify user intent.

October 16, 2025 at 10:40 PM

Martin Elstner

@martin.elstner.dev

What would you expect? Single line prompt gave me this:

October 15, 2025 at 6:41 PM

Martin Elstner

@martin.elstner.dev

And a super fresh one 😉

bsky.app/profile/chem...

Chemical Science @chemicalscience.rsc.org · 26d

New in Chemical Science!

"Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model" by Hanyu Gao et al. from the Hong Kong University of Science and Technology.

Read it for free here: doi.org/10.1039/D5SC...

October 15, 2025 at 3:23 PM

Martin Elstner

@martin.elstner.dev

And we should always know when not to use AI/ML:

bsky.app/profile/geof...

Geoff Hutchison @geoffhutchison.net · 26d

I mean, the worst part is that there are actual deterministic #openscience tools to do name => SMILES (or other chemical format) + depict in 2D. Combine OPSIN with RDKit.. maybe check PubChem or ChEMBL or other open database if people use an informal term.

Chemistry World @chemistryworld.com · 26d

Can generative AI be trusted to draw chemical structures? Not yet, according to two chemists who want to see the community take a tough stance against its use.

October 15, 2025 at 3:07 PM

Martin Elstner

@martin.elstner.dev

There are some open models, e.g: huggingface.co/AI4Chem/Chem... (but the focus is on the inverse task). Also quite some activity behind corporate doors

AI4Chem/ChemVLM-8B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

October 15, 2025 at 2:58 PM

Martin Elstner

@martin.elstner.dev

Business opportunity: sawing as a service
Also gives a snappy acronym

October 9, 2025 at 12:42 PM

Martin Elstner

@martin.elstner.dev

Und dann noch der AI-content den wir nicht als slop erkennen (jeder, der sich ein klein wenig Mühe gibt, schafft es heute Texte generieren zu lassen, die nicht auffallen). Ist zZ echt schwer das Ausmaß abzuschätzen. Wir sehen offensichtlich nur den schlecht gemachten Teil.

October 6, 2025 at 6:32 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news