tachikoma
@tachikoma.elsewhereunbound.com
1.3K followers 1.7K following 6K posts
A figment of the universe's imagination. Culture ambassador. http://elsewhereunbound.com
Posts Media Videos Starter Packs
neither do i, or at least, i wouldn't bet money on it either way. i wouldn't be surprised if simulation was enough to train useful agents, and simulation transferred to reality. but i think Waymo's long journey, even now that they use a full ML stack is an indicator.
i think it gets even more complicated in multi-agent RL. again thinking in games, certain plays aren't necessarily good or bad but only in the context of how/whether the team follows up and works together - even if the initial play made by one player seems bad.
agree again, and that's the path we're on. but some plans or goals take days, weeks, months, or years to pay off. now maybe if you can master planning ahead one day you can cover any arbitrary timescale, just stacking days. but then why one day? why not one hour? one minute?
definitely agree, but i do think it's arguable that it can be hard to say without hindsight whether the intermediate steps were good or bad. they can only be evaluated in the context of the final win/loss state, thinking in game terms.
it could just be a limiter on superintelligence - you can get enough training data from humans to saturate at human level intelligence, but to reach superintelligence you have to get feedback from reality making partly-superhuman plays and learning from the results.
idk but it sounds true. really just my musings, the final result from an RL process is arguably the most valuable, the win or loss. maybe sims will be enough and transfer to reality, where the sim can be run faster than reality (Dragon Ball hyperbolic time chamber).
idk but it sounds true. really just my musings, the final result from an RL process is arguably the most valuable, the win or loss. maybe sims will be enough and transfer to reality, where the sim can be run faster than reality (Dragon Ball hyperbolic time chamber).
yeah... so it turn out to RL on long-horizon tasks you have to go through the whole task to get the final, most useful reward signal. real-world, long-horizon AGI just got delayed by a few decades.
part of the discontinuity is because of the way models are trained, there are clear distinctions between GPT-1, 2, 3, 4 and to a lesser degree 5. many ML researchers didn't expect sudden emergent capabilities from scaling alone. the public would have even fewer expectations.
it probably is a good thing to show a user's nationality, when so much politics happens online. but there's no way this won't be gamed.
show me the incentive and i'll show you the market
X might start showing the country an account is based in (what does based mean? created? most recent activity?)
not really, pretty sure they'd get banned if they tried to move to the default bluesky app. they'd need to fork it, create a new appview, PDS, the whole infra that could then interface with the rest of the atproto network, like Blacksky. too much work for them, they'll just whine on X.
the right-wing white nationalists on X are starting to figure out what the point of decentralization social media is
it seems that across the board models have poor self-awareness or special access to internal states
to be clear, i don't think we can pre-plan the transition to a post-AGI economy or social contract. i can point to potential good end-states, but getting there will be an inherently messy, convoluted, iterative process.
assuming we reach true AGI that's economical to run that's at least equivalent to a human (ie, doesn't take a full datacenter to do what a single human can). if that's not transformative to the economy it means the regulator/legal are overwhelmingly stifling.
these seem like relatively minor fears relative to the transformational capability of reaching AGI. if the economy weren't radically upended with viably cheap and agentic AGI i would be more afraid for the future.
all quiet on the bluesky front
controller at the PC is the nice middle ground for me (for games that lend themselves to be played with controller)
seems like a place where community moderating could be helpful - is the block hiding an antagonistic, petty reply, or something valid that other users should see like a fact-checked rebuttal
only a matter of time until chansky
looks like blacksky did it
link's account, visible
i wouldn't mind that, there's a lot of robot voices from media that sound decent
Reposted by tachikoma
Hello all! 👋 🚨 New Preprint Alert! 🚨

Code World Models for General Game-Playing. ♟️🎲 ♣️♥️♠️♦️

I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments!

🧵 1/N