Lightnews — Scholar-powered news

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

AGI timeline
> 2026/2027 on a straight line extrapolation
> Add 1-2 years for inevitable stumbling blocks
> Blockers in the field have receded but not disappeared

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

> In Anthropic's dealings with corporation and governments including USG
- A few people get it, and start the ball rolling in deployment
- At that point a competitive tailwind kicks in
- If China deploys, the US starts competing
- “The spectre of competion plus a few visionaries” is all it takes

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

> Pushes back on @tylercowen 's 50-100 years estimate
> Most economists follow Robert Solow, “You see the computer revolution everywhere except the productivity statistics.”.

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

How fast will AGI create change?
> Dario picks middle ground
> Does not see AGI conquering the world in five days, since
a) physical systems take time to build and
b) AGI that we do build would be law abiding, and have to deal with human system complexities and regulations.

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

Who gets to decide the constitution for the strong AI?
> Basic principles:
- no CBRN risks
- adhere to rule of law
- basic principles of democracy
> Outside of that believes users should be free to fine tune and use models as they please.

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

> Promising areas for research with insufficient people working on them:
- mechanistic interpretabilty
- long horizon learning and long horizon task
- evaluations particularly of dynamic systems
- multi-agent coordination

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

On hiring
> “talent density beats talent mass”, prefer “ a team of 100 people that are super smart, motivated and aligned with the mission“

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

> Anthropic's way of dealing with this uses an if/then framework: If "you can show the model is dangerous" then "you clamp down hard"
> Mechanistic interpretability is where they hope to be able to verify and check model state in ways the model cannot access on its own

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

- Level 5 - Models become more capable than all humanity

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

- Level 4 - They start to increase the capacity of a state actor, and become the primary source of CBRN risks. If you wanted to do something dangerous, you would use the model. Autonomy and deception, including sandbagging tests and sleeper agents become an issue.

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

- Level 3 - Agents, which we hit next year. They start to be a useful assistant in increasing a non-state actors capacity for a CBRN attack. Can be mitigated with filters as the model is not yet autonomous, and efforts must be made to prevent theft by non-state actors.

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

- Level 2 - ChatGPT/Claude, about as capable as Google in providing CBRN info, and no autonomy

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

> AI Safety Levels
- Level 1 -> chess playing Deep Blue, narrow task focused AI

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

Risks and Safety
> worries about
a) cyber, bio, radiological, nuclear (CBRN)
b) model autonomy

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

On complaints models have become dumber:
> believes its hedonic adjustment
> besides pre-release A/B testing the day prior, Anthropic does not change its models

November 21, 2024 at 4:31 AM

Prakash | (Ate-A-Pi)

@8teapi.bsky.social

> Mechanistic interpretability is a safety and transparency technique Anthropic leads at and is a recruiting draw
> when they win a new hire, Dario tells them "The other places you didn’t go, tell them why you came here."
> Has led to interpretability teams being built out elsewhere

November 21, 2024 at 4:31 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news