Aran Nayebi
banner
anayebi.bsky.social
Aran Nayebi
@anayebi.bsky.social
Assistant Professor of Machine Learning, Carnegie Mellon University (CMU)

Building a Natural Science of Intelligence 🧠🤖

Prev: ICoN Postdoctoral Fellow @MIT, PhD @Stanford NeuroAILab
Personal Website: https://cs.cmu.edu/~anayebi
The 2nd paper circumvents the first paper's main "no free lunch" barrier of encoding "all human values", by identifying small value sets that yield the *first* formal guarantees on corrigibility.

In the AAAI Machine Ethics Workshop (W37) Proceedings 👇:
bsky.app/profile/anay...
1/ How do we build AI systems that are corrigible—shut down when asked, tell the truth, preserve oversight—and still do something useful?

We give the first provable framework that makes it implementable—unlike RLHF or Constitutional AI, which can fail when goals conflict.

🧵👇
November 21, 2025 at 12:42 AM
November 19, 2025 at 9:21 PM
November 10, 2025 at 8:46 PM
Finally, we briefly discuss Querying Transformers for text-image alignment, as a hold-over from last lecture on multimodal foundation models!
October 23, 2025 at 1:44 PM
We also discuss data quality & amount (where you get great performance with a smaller model trained on lots of tokens), how to get good data depending on your application, and Moravec's paradox for robotics foundation models.
October 23, 2025 at 1:44 PM
Thanks @undo-hubris.bsky.social for the invite & for hosting!

Slides: anayebi.github.io/files/slides...

Paper 1 (alignment barriers): arxiv.org/abs/2502.05934
Paper 1 summary: bsky.app/profile/anay...

Paper 2 (corrigibility): arxiv.org/abs/2507.20964
Paper 2 summary: bsky.app/profile/anay...
October 10, 2025 at 3:16 PM
Academic paper: bsky.app/profile/anay...
Can a Universal Basic Income (UBI) become feasible—even if AI fully automates existing jobs and creates no new ones?

We derive a closed-form UBI threshold tied to AI capabilities that suggests it's potentially achievable by mid-century even under moderate AI growth assumptions:
October 5, 2025 at 3:23 PM
Next time we discuss how to optimize these reward models via DPO/policy gradients!

Slides: www.cs.cmu.edu/~mgormley/co...

Full course info: bsky.app/profile/anay...
October 1, 2025 at 7:46 PM
Specifically, we cover methods which don't involve parameter-updating, e.g. In-Context Learning / Prompt-Engineering / Chain-of-Thought Prompting, to methods that do, such as Instruction Fine-Tuning & building on IFT to perform full-fledged Reinforcement Learning from Human Feedback (RLHF).
October 1, 2025 at 7:46 PM
September 29, 2025 at 8:00 PM
6/6 I close with reflections on AI safety and alignment, and the Q&A explores open questions: from building physically accurate (not just photorealistic) world models to the role of autoregression and scale.

🎥Watch here: www.youtube.com/watch?v=5deM...

Slides: anayebi.github.io/files/slides...
RI Seminar: Aran Nayebi : Using Embodied Agents to Reverse-Engineer Natural Intelligence
YouTube video by CMU Robotics Institute
www.youtube.com
September 29, 2025 at 2:02 PM
5/6 I also touch on the Contravariance Principle/Platonic Representation Hypothesis, our proposed NeuroAI Turing Test, and why embodied agents are essential for building not just more capable, but also more reliable, autonomous systems.
September 29, 2025 at 2:02 PM