Craig Sherstan
craigsherstan.bsky.social
Craig Sherstan
@craigsherstan.bsky.social
18 followers 3 following 13 posts
AI Research Scientist - Reinforcement Learning. Tokyo based.
Posts Media Videos Starter Packs
Reposted by Craig Sherstan
This paper has now been accepted @neuripsconf.bsky.social !

Huge congratulations, Hon Tik (Rick) Tse and Siddarth Chandrasekar.
📢 I'm happy to share the preprint: _Reward-Aware Proto-Representations in Reinforcement Learning_ ‼️

My PhD student, Hon Tik Tse, led this work, and my MSc student, Siddarth Chandrasekar, assisted us.

arxiv.org/abs/2505.16217

Basically, it's the SR with rewards. See below 👇
Really cool opening at DeepMind right now for someone to explore "what comes after AGI" (closes Friday Sept 26, 2025):
job-boards.greenhouse.io/deepmind/job...
Research Scientist, Post-AGI Research
London, UK
job-boards.greenhouse.io
Reposted by Craig Sherstan
Reposted by Craig Sherstan
The Sony AI Game AI team has reinforcement learning internships opens for 2026!

It is remote for people in the US & Canada, mixed remote/onsite in Europe (onsite in Zurich), and onsite in Tokyo.

If you want to work on RL with cool applications, sign up!

ai.sony/joinus/job-r...
Reinforcement Learning Research Intern 2026 for Game AI – Sony AI
ai.sony
timed coding tests -> stress -> fight or flight -> loss of fine motor control -> ca n'ptt typp
Thanks to @marloscmachado.bsky.social for the invite to speak at the University of Alberta today. Hopefully someone remembers my main point: Reward design is really important. :)
Reposted by Craig Sherstan
"A scientist believes..." isn't noteworthy unless it can be followed by "because science shows..."
I just came across a technical article written by *Dr.* So-and-so. Seeing Dr. made me more impressed and then I remembered "I'm a Dr. too". Inbuilt biases :P I'm definitely going to start using my Dr. title.
Learning to Reason without External Rewards arxiv.org/pdf/2505.19590

LLM finetuning is done ONLY using internal reward (model confidence) with no external grounding reward.
That means the LLM had to already know how to solve the problems.
arxiv.org
The quality of the LLM-based documentation can vary significantly depending on prompt, context, etc.). It can do really well, hitting all of the points you listed in the rest of this thread. It can also do pretty terribly. LLM is a tool, to use it well we have to invest the time to learn.
Sometimes I imagine a world where all the friends that I'm trying to coordinate use the same messaging app. One can dream...
I was playing with a couple of emotion detection models today.
Apparently my resting face is one of: disgust, anger, sad and my happy face is contempt :P
Cortical Labs combines human neurons with silicon computing for a cool #cyborg computer!!! And you can buy one, or use their cloud service.

corticallabs.com
Cortical Labs
We've combined lab-grown neurons with silicon chips and made it available to anyone, for first time ever.
corticallabs.com
Reposted by Craig Sherstan
I'm giving a keynote on building GT Sophy - our reinforcement learning based racing agent for Gran Turismo. Tuesday Nov 26, 2024 8:40 JST (online). Computers and Games Conference. #rl #granturismo #sonyai #gtsophy
That and the fact that most everyone seems to mean something different when they say "AGI".