Anonymous submissions: https://forms.gle/3RP2xu2tr8beYs5c8
Run by
@TheMidasProject.bsky.social
Read the full diff at our website: www.themidasproject.com/watchtower/g...
Read the full diff at our website: www.themidasproject.com/watchtower/g...
Critical capability levels, which previously focused on capabilities (e.g. "can be used to cause a mass casualty event") now seems to rely on anticipated outcomes (e.g. "resulting in additional expected harm at severe scale")
Critical capability levels, which previously focused on capabilities (e.g. "can be used to cause a mass casualty event") now seems to rely on anticipated outcomes (e.g. "resulting in additional expected harm at severe scale")
And in some ways, it is: they define a new harmful manipulation risk category, and they even soften the claim from v2 that they would only follow their promise if every other company does so as well.
And in some ways, it is: they define a new harmful manipulation risk category, and they even soften the claim from v2 that they would only follow their promise if every other company does so as well.
(Added)
(Added)
Now all they need to do is provide transparency on *all* the commitments they've made + when they are choosing to abandon any.
Now all they need to do is provide transparency on *all* the commitments they've made + when they are choosing to abandon any.
Perhaps they understood that the commitments were not contingent on whatever way the political winds blow, but made to the public at large.
fedscoop.com/voluntary-ai...
Perhaps they understood that the commitments were not contingent on whatever way the political winds blow, but made to the public at large.
fedscoop.com/voluntary-ai...
bidenwhitehouse.archives.gov/briefing-roo...
bidenwhitehouse.archives.gov/briefing-roo...
www.seoul-tracker.org
www.seoul-tracker.org
Date: February 10, 2025
Change: Released Risk Management Framework draft
URL: x.ai/documents/20...
xAI's policy is stronger than others in terms of using specific benchmarks, but lacks threshold details, and provides no mitigations.
Date: February 10, 2025
Change: Released Risk Management Framework draft
URL: x.ai/documents/20...
xAI's policy is stronger than others in terms of using specific benchmarks, but lacks threshold details, and provides no mitigations.
Date: February 10, 2025
Change: Released their Frontier Model Safety Framework
URL: amazon.science/publications...
Like Microsoft, Amazon's policy also goes through the motions while setting vague thresholds that aren't clearly connected to specific mitigations
Date: February 10, 2025
Change: Released their Frontier Model Safety Framework
URL: amazon.science/publications...
Like Microsoft, Amazon's policy also goes through the motions while setting vague thresholds that aren't clearly connected to specific mitigations
Date: February 8, 2025
Change: Released their Frontier Governance Framework
URL: cdn-dynmedia-1.microsoft.com/is/content/m...
Microsoft's policy is an admirable effort, but as with others, needs further specification. Mitigations should also be connected to specific thresholds
Date: February 8, 2025
Change: Released their Frontier Governance Framework
URL: cdn-dynmedia-1.microsoft.com/is/content/m...
Microsoft's policy is an admirable effort, but as with others, needs further specification. Mitigations should also be connected to specific thresholds
Date: February 7th, 2025
Change: Released their "Secure AI Frontier Model Framework"
URL: cohere.com/security/the...
Cohere's framework mostly neglects the most important risks. Like G42, they are not developing frontier models, which makes this more understandable.
Date: February 7th, 2025
Change: Released their "Secure AI Frontier Model Framework"
URL: cohere.com/security/the...
Cohere's framework mostly neglects the most important risks. Like G42, they are not developing frontier models, which makes this more understandable.
Date: February 6, 2025
Change: Released their "Frontier AI Framework"
URL: g42.ai/application/...
G42's policy is surprisingly strong for a non-frontier lab. It's biggest issues are a lack of specificity and not defining future thresholds for catastrophic risks.
Date: February 6, 2025
Change: Released their "Frontier AI Framework"
URL: g42.ai/application/...
G42's policy is surprisingly strong for a non-frontier lab. It's biggest issues are a lack of specificity and not defining future thresholds for catastrophic risks.
Date: February 4, 2025
Change: Released v2 of their Frontier Safety Framework
URL: deepmind.google/discover/blo...
v2 of the framework improves Google's policy in some areas while weakening it in others, most notably no longer promising to adhere to it if others are not.
Date: February 4, 2025
Change: Released v2 of their Frontier Safety Framework
URL: deepmind.google/discover/blo...
v2 of the framework improves Google's policy in some areas while weakening it in others, most notably no longer promising to adhere to it if others are not.