Prithviraj "Raj" Ammanabrolu
banner
rajammanabrolu.bsky.social
Prithviraj "Raj" Ammanabrolu
@rajammanabrolu.bsky.social
4.1K followers 240 following 170 posts
AI, RL, NLP, Games Asst Prof at UCSD Research Scientist at Nvidia Lab: http://pearls.ucsd.edu Personal: prithvirajva.com
Posts Media Videos Starter Packs
My students will be presenting papers next Wed/Thursday so be sure to check those out too
I'll be at #CoLM2025 and the IVADO agents workshop right before in Montreal. My students will be presenting two papers in the main conf. I'll also do a ws keynote where I'll talk about some of our latest. Come by and say hi next week!
I'm probably mostly going to stop posting on this site. There's close to no engagement and it's not worth the effort to cross post for the amount of time that takes. Find me elsewhere / email me
I recently left Mosaic/Databricks Research. It's been a ride building out the RL team from <4 ppl to 20+ across two companies & acquisition +figuring out RL as a Service in prod. Mosaic had insane talent density

Some "relaxation" while I put out Prof fires for a smol bit then new adventures!
The thing that feels so off about the core tech world is that every convo is very transactional. Maybe true elsewhere too. "Oh you're an expert in RL, can you answer questions about my new startup?"

Every single (Bay) party. No I do not want to consult. I just wanna hang out.
Of all the labeling startups out there to acquihire, this was... an interesting choice. Says a lot actually
. @bosungkim.bsky.social will be at #CVPR2025 in Nashville this week to present this and just generally talk about scaling memory for embodied agents!

Catch her at the poster sessions and also the Foundation Models meets Embodied Agents Workshop on Wed
"Foundation" models for embodied agents are all the rage but how to actually do complex looong context reasoning? Can we scale Beyond Needle(s) in the (Embodied) Haystack?

∞-THOR is an infinite len sim framework + guide on (new) architectures/training methods for VLA models
Yes AI for edu is a thing but almost all vanilla LLMs just railroad students into answers. Complete cognitive offload is not useful for improving learning outcomes
I've heard this personally from multiple PMs at AI companies. Students are one of the biggest demographics and they need to "break in" and have even more usage to improve their metrics. Classic corporate economic incentives
AI companies in the US gave access to their systems to students for free during college exams

China disabled access to AI systems during nationwide college exams www.theverge.com/news/682737/...

Feel free to draw your own conclusions
China shuts down AI tools during nationwide college exams
New age problems require new age solutions.
www.theverge.com
Tis the era of bringing back every AI benchmark ever but this time by the LLM people and for the LLMs
Had a fun little visit to Cambridge LTL where I talked about a bunch of my lab's latest papers including some still not public with the key takeaway that "RL can absolutely learn new things and is not just resurfacing knowledge"
talks.cam.ac.uk/show/archive...
talks.cam : Language Technology Lab Seminars
talks.cam.ac.uk
That's fair, I guess I should rephrase to "regardless of a possible common prior, it's nearly impossible for different providers to have the same representations pop out of their post trained LLM"
The moral of the story here is basically that who is making your LLM really matters. Internal use cases critical to their businesses will always influence data distributions and everything downstream of that. This is in contrast to things like Platonic Representation Hypothesis
Interesting tidbit from UCSD's Victor Shih on a podcast talking about Chinese AGI efforts is that Deepseek is good at Chinese govt doc understanding cause that's what affects stock prices most and DS is a hedge fund.
www.youtube.com/watch?v=b1Te...
Xi Jinping’s paranoid approach to AGI, debt crisis, & Politburo politics — Victor Shih
YouTube video by Dwarkesh Patel
youtu.be
Looks like Gemini gets AIR 6 in #JEE2025 with a score of 323

Only 5 highschoolers in all India do better than an LLM in the single most important exam of their to get into the IITs

The legacy edu selection systems are now worse than useless
I get prepping for worst case scenarios but a lot of AI Safety debates I somehow end up these days in boil down to "assume you have Machine God in a box, now tell me how to align it"

I could rant for hours but seriously y'all this isn't productive
But even then agents only perform well up to 130k after which perf sharply decreases and all architectures and additional context extension methods we modified fail after ~400k. None make it to the 3m context sample test set we use let alone infinite. Lots of space for progress!
And find that when accounting for hardware constraints, only a specific combo of interleaved VLA with a mix of context parallelism + some extension with high pre training context window size works well. We detail the exact architecture and describe potential improvements
The first is a static eval called Needle(s) in the Embodied Haystack, which is like QA asking agents to post hoc analyze their trajectories putting many needles together

We then go Beyond this with interactive RL style evals to see how well models interact with a changing env
So we first extended AI2's THOR embodied sim to continue generating meaningful tasks that are effectively infinite in length. Eg things you do at context len 7k can get reused at 3m

We do a thorough analysis of many types of architectures x training methods on two new evals