In the first paper, @candemircan.bsky.social and @tankred-saanum.bsky.social use sparse autoencoders to show that LLMs can implement temporal difference learning in context. This work is together with Akshay Jagadish and @marcelbinz.bsky.social.
arxiv.org/abs/2410.01280
In the first paper, @candemircan.bsky.social and @tankred-saanum.bsky.social use sparse autoencoders to show that LLMs can implement temporal difference learning in context. This work is together with Akshay Jagadish and @marcelbinz.bsky.social.
arxiv.org/abs/2410.01280
Check the paper here: arxiv.org/abs/2306.09377
Reach out if you're at NeurIPS and wanna talk about representational alignment, mechanistic interpretability, or CogSci!
Check the paper here: arxiv.org/abs/2306.09377
Reach out if you're at NeurIPS and wanna talk about representational alignment, mechanistic interpretability, or CogSci!