On the job market this cycle!
alexanderhoyle.com
Oddly I have seen much more LinkedIn use for Zürich AI things
Oddly I have seen much more LinkedIn use for Zürich AI things
However, this does not work very well---LLM outputs tend to cluster or "heap" around certain integers (and do so inconsistently between models)
Code: github.com/haukelicht/s...
With:
@haukelicht.bsky.social *
@rupak-s.bsky.social *
@patrickwu.bsky.social
@pranavgoel.bsky.social
@niklasstoehr.bsky.social
@elliottash.bsky.social
- Directly prompting on a scale is surprisingly fine, but *only if* you take the token-probability weighted average over scale items, Σ⁹ₙ₌₁ int(n) ⋅ p(n|x) (cf @victorwang37.bsky.social)
- Finetuning w/ a smaller model can do really well! And with as few as 1,000 paired annotations
- Directly prompting on a scale is surprisingly fine, but *only if* you take the token-probability weighted average over scale items, Σ⁹ₙ₌₁ int(n) ⋅ p(n|x) (cf @victorwang37.bsky.social)
- Finetuning w/ a smaller model can do really well! And with as few as 1,000 paired annotations
As ground truth, we use human-annotated pairwise ranks on 3 constructs in social science from prior work (ad negativity, grandstanding, and fear about immigration), inducing scores via Bradley-Terry
As ground truth, we use human-annotated pairwise ranks on 3 constructs in social science from prior work (ad negativity, grandstanding, and fear about immigration), inducing scores via Bradley-Terry
After all, it is easier for *people* to compare items relatively than to score them directly
After all, it is easier for *people* to compare items relatively than to score them directly
However, this does not work very well---LLM outputs tend to cluster or "heap" around certain integers (and do so inconsistently between models)
However, this does not work very well---LLM outputs tend to cluster or "heap" around certain integers (and do so inconsistently between models)
psycnet.apa.org/record/2013-...
www.sciencedirect.com/science/arti...
overview: 3starlearningexperiences.wordpress.com/2018/01/09/l...
psycnet.apa.org/record/2013-...
www.sciencedirect.com/science/arti...
overview: 3starlearningexperiences.wordpress.com/2018/01/09/l...
arxiv.org/pdf/2403.03923
For more on resource use, I found this blog post very informative: andymasley.substack.com/p/individual...
arxiv.org/pdf/2403.03923
For more on resource use, I found this blog post very informative: andymasley.substack.com/p/individual...
but those RNNs were also *less* efficient; transformers were lauded precisely because they were so much more efficient
but those RNNs were also *less* efficient; transformers were lauded precisely because they were so much more efficient