Siddarth Venkatraman @ NeurIPS 2024
@hyperpotatoneo.bsky.social
PhD student at Mila | Diffusion models and reinforcement learning 🧐 | hyperpotatoneo.github.io
Honestly, it feels like as an AI researcher it might actually be worth it to throw your dignity aside and pay Elon for Twitter blue to advertise your papers. Getting papers famous is literally just a social media clout game now.
January 28, 2025 at 3:33 PM
Honestly, it feels like as an AI researcher it might actually be worth it to throw your dignity aside and pay Elon for Twitter blue to advertise your papers. Getting papers famous is literally just a social media clout game now.
See the second part of my post - yes, they are likely using explicit search to improve performance at test time. But the focus should be on the search through reasoning chains itself, which the model has been trained to do with RL. Even for the explicit search, you require the RL value functions.
December 23, 2024 at 1:02 AM
See the second part of my post - yes, they are likely using explicit search to improve performance at test time. But the focus should be on the search through reasoning chains itself, which the model has been trained to do with RL. Even for the explicit search, you require the RL value functions.
I think the intersection of builders and researchers is higher in machine learning, compared to other disciplines.
December 22, 2024 at 5:13 AM
I think the intersection of builders and researchers is higher in machine learning, compared to other disciplines.
You could still wrap this with explicit search techniques like MCTS if you have value functions for partial sequences (which would also be a product of the RL training). This could further improve performance, similar to fast vs slow policy in AlphaZero.
December 22, 2024 at 4:09 AM
You could still wrap this with explicit search techniques like MCTS if you have value functions for partial sequences (which would also be a product of the RL training). This could further improve performance, similar to fast vs slow policy in AlphaZero.
You’re correct, there’s plenty of simulated environments we can’t solve yet. But do you consider having 1 million parallel instances of an environment sped up 100x solving it with PPO with low wall clock time a desirable solution?
December 22, 2024 at 2:31 AM
You’re correct, there’s plenty of simulated environments we can’t solve yet. But do you consider having 1 million parallel instances of an environment sped up 100x solving it with PPO with low wall clock time a desirable solution?
This isn’t a general solution to RL. The point is to make learning algorithms sample efficient. If the environment you are doing RL on is the real world, you can’t make the “environment go fast”.
With “infinite samples”, you can random sample policies till you stumble on one with high reward.
With “infinite samples”, you can random sample policies till you stumble on one with high reward.
December 21, 2024 at 3:51 PM
This isn’t a general solution to RL. The point is to make learning algorithms sample efficient. If the environment you are doing RL on is the real world, you can’t make the “environment go fast”.
With “infinite samples”, you can random sample policies till you stumble on one with high reward.
With “infinite samples”, you can random sample policies till you stumble on one with high reward.
Even his current claim that o1 is “better than most humans in most tasks” is pretty wild imo. What are “most tasks” here even? Obviously not any physical tasks because there is no embodiment. Can o1 actually completely replace a human in any job? Can it manage a project from start to finish?
December 7, 2024 at 11:07 PM
Even his current claim that o1 is “better than most humans in most tasks” is pretty wild imo. What are “most tasks” here even? Obviously not any physical tasks because there is no embodiment. Can o1 actually completely replace a human in any job? Can it manage a project from start to finish?
x.com
x.com
December 7, 2024 at 10:54 PM
It also doesn’t help when OpenAI staff post about how o1 is already AGI (yes this happened today).
Unfortunately the dialogue is directed by those on either end of the spectrum (AI is useless vs AGI is already here) without much room for nuance.
Unfortunately the dialogue is directed by those on either end of the spectrum (AI is useless vs AGI is already here) without much room for nuance.
December 7, 2024 at 10:14 PM
It also doesn’t help when OpenAI staff post about how o1 is already AGI (yes this happened today).
Unfortunately the dialogue is directed by those on either end of the spectrum (AI is useless vs AGI is already here) without much room for nuance.
Unfortunately the dialogue is directed by those on either end of the spectrum (AI is useless vs AGI is already here) without much room for nuance.
It is reductive to blame it all on a single CEO, but I find it hard to believe how you are “shocked” by this public reaction. UHC has the highest claim denial rate among insurance providers, resulting in untold medical bankruptcies and preventable deaths. I’m shocked this doesn’t happen more often.
December 6, 2024 at 8:45 AM
It is reductive to blame it all on a single CEO, but I find it hard to believe how you are “shocked” by this public reaction. UHC has the highest claim denial rate among insurance providers, resulting in untold medical bankruptcies and preventable deaths. I’m shocked this doesn’t happen more often.
Subtlety and nuance go out the window when strong political feelings are thrown in the mix. I understand why AI researchers can get defensive/angry due to toxic comments, but we should still try to understand the origin of people’s anger. Imo, right wing AI silicon valley billionaires are the root.
December 1, 2024 at 8:40 PM
Subtlety and nuance go out the window when strong political feelings are thrown in the mix. I understand why AI researchers can get defensive/angry due to toxic comments, but we should still try to understand the origin of people’s anger. Imo, right wing AI silicon valley billionaires are the root.
I think the recent conflict between AI researchers and the anti-AI clique hints at the latter. This broad left leaning user base could fracture again as differences in opinions between the farther left and moderate factions get amplified.
December 1, 2024 at 4:28 AM
I think the recent conflict between AI researchers and the anti-AI clique hints at the latter. This broad left leaning user base could fracture again as differences in opinions between the farther left and moderate factions get amplified.
Another thing; let’s reflect if they actually have a point. When I deeply reflect upon it, I am not even personally convinced that in the grand scheme of things AI is going to be a net good for humanity. So, maybe the distaste is warranted and we’re the ones in the bubble?
November 30, 2024 at 2:05 PM
Another thing; let’s reflect if they actually have a point. When I deeply reflect upon it, I am not even personally convinced that in the grand scheme of things AI is going to be a net good for humanity. So, maybe the distaste is warranted and we’re the ones in the bubble?
Yeah, it will definitely not be “true OT” at end, but it works to get surprisingly smooth ODE paths which can be easily numerically integrated. You can train a CIFAR 10 flow model which can generate high quality images with 5-10 Euler steps.
November 30, 2024 at 1:51 PM
Yeah, it will definitely not be “true OT” at end, but it works to get surprisingly smooth ODE paths which can be easily numerically integrated. You can train a CIFAR 10 flow model which can generate high quality images with 5-10 Euler steps.
You can do minibatch OT coupling to get actual optimal transport flows with simulation free training.
arxiv.org/abs/2302.00482
arxiv.org/abs/2302.00482
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their simulation-based maximum likelihood training. We introduce the...
arxiv.org
November 30, 2024 at 1:18 PM
You can do minibatch OT coupling to get actual optimal transport flows with simulation free training.
arxiv.org/abs/2302.00482
arxiv.org/abs/2302.00482
Sure, that argument works from a utilitarian perspective.
From monkey brain casual user point of view, it looks ugly and outdated. And I think this is what should be focused on.
From monkey brain casual user point of view, it looks ugly and outdated. And I think this is what should be focused on.
November 29, 2024 at 4:03 AM
Sure, that argument works from a utilitarian perspective.
From monkey brain casual user point of view, it looks ugly and outdated. And I think this is what should be focused on.
From monkey brain casual user point of view, it looks ugly and outdated. And I think this is what should be focused on.
You can just have a verification system like the system in pre-Elon twitter, where blue check marks are verified accounts.
November 29, 2024 at 3:52 AM
You can just have a verification system like the system in pre-Elon twitter, where blue check marks are verified accounts.