Seth Karten
sethkarten.ai
Seth Karten
@sethkarten.ai
Autonomous Agents | PhD @ Princeton | Prev: CMU, Waymo | NSF GRFP Fellow
The assumption is not that bad. Additionally it is not a hard threshold so the methods will scale as models get better
November 28, 2025 at 6:47 PM
EC is partially solved with foundation models. The social settings arent and the LLM Economist takeaways are going to be very practical moving forward. If you have aligned agents, many multi-agent problems become simple optimization problems. You just need to train with a scaffold like claude code
November 28, 2025 at 6:29 AM
These are pretty cool.. but i guess nothing ever happened with it? I like the jersey city uber eats robots a lot too

But we should still be building and deploying things here 100x faster
November 27, 2025 at 9:23 PM
Philly is a good place to deploy. My issue is the general anti AI sentiment is stronger in the northeast. (At least my perception as a lifelong northeaster) Many view the world as zero sum instead of general sum. It is much easier to build something new when you can abundantly find likeminded people
November 27, 2025 at 8:58 PM
Between setbacks in boston from taxi unions and now this, i have pretty much given up on the northeast long term. At this rate the northeast will become a 20th century museum like europe
November 27, 2025 at 8:43 PM
Trains should be autonomous
November 26, 2025 at 10:32 PM
Yes, please bring on the supply
We need:
- cheap energy
- cheap housing
- cheap food

Only possible by increasing supply
November 11, 2025 at 8:45 PM
Gen 1 OU Pokemon Qualifiers end tonight and I'm not even competing, yet I'm nervously watching error bars converge.

(5/5)
October 20, 2025 at 3:50 AM
Most LLM arenas use Bradley-Terry (batch MLE)—accurate but requires full recomputation. Glicko-1 offers the best of both worlds: online updates and convergence to the batch optimum, with uncertainty estimates included.

(4/5)
October 20, 2025 at 3:50 AM
Top-3 agents converge across all methods (250+ games each). But ranks 4+ show systematic disagreement:
-Elo diverges from HR even when HR's error bars don't overlap
-Glicko-1 agrees with HR despite being online

(3/5)
October 20, 2025 at 3:50 AM
In the NeurIPS PokeAgent Challenge, we stress-test 4 ranking systems across (100k+ agent matches):
- Bradley-terry (batch MLE, our ground truth)
- Elo (online, chess-standard)
- Glicko-1 (online, uncertainty-aware)
- GXE: (Glicko-derived win %)

(2/5)
October 20, 2025 at 3:50 AM
A benchmark environment is nothing without data so you can pretrain before you RL.

Announcing our replay archive preview: We are releasing an additional 25k games to help you train a metagame exploiter (5 million more released after qualifier)

replays.pokeagentshowdown. com:8443/
(3/3)
October 15, 2025 at 5:50 PM
- Gen 1 OU Battles require 100+ turns of long context planning in partially observable, stochastic environments

Check out the PokeAgent Challenge Gen 1 OU Qualifier live this week👇
youtube.com/live/N6JmD5XKf4g
(2/3)
YouTube
Share your videos with friends, family, and the world
youtube.com
October 15, 2025 at 5:50 PM
Apparently i need to fullscreen my browser for the new post button to show up
October 15, 2025 at 5:35 PM
We should have the highest standards for the most influential research companies
September 27, 2025 at 4:07 AM
In the future people will play games for the mind similar to going to the gym for the body
September 25, 2025 at 9:11 PM
If you arent paying attention, we are in a rapidly shifting period of ML paper culture. ICLR/ICML/NeurIPS are being treated as random, out of touch processes with more and more unnecessary work to submit
Most people are saying TMLR is the only good alternative, but are skeptical
September 24, 2025 at 2:31 PM
Join our Discord for more info: discord.gg/E2DuX5FWF7
Join the PokéAgent Challenge @ NeurIPS 2025 Discord Server!
https://pokeagent.github.io/ | 362 members
discord.gg
September 2, 2025 at 1:44 PM