Lightnews — Scholar-powered news

@safetychanges.bsky.social

Similarly, for ML R&D, models that "can" accelerate AI development no longer require RAND SL 3. Only models that have been used for this purpose count. But this is a strange ordering -- shouldn't the safeguards precede the deployment (and even the training) of such a model?

September 25, 2025 at 6:10 PM

The Midas Project Watchtower

@safetychanges.bsky.social

But it's weakened in other ways.

Critical capability levels, which previously focused on capabilities (e.g. "can be used to cause a mass casualty event") now seems to rely on anticipated outcomes (e.g. "resulting in additional expected harm at severe scale")

September 25, 2025 at 6:10 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Date: September 22, 2025
Company: Google
Change: Released v3 of their Frontier Safety Framework

September 25, 2025 at 6:10 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Date: Feb 26 - March 6, 2025
Company: Google
Change: Scrubbed mentions of diversity and equity from the mission description of their Responsible AI team.

March 8, 2025 at 12:19 AM

The Midas Project Watchtower

@safetychanges.bsky.social

(Removed)

March 5, 2025 at 4:56 AM

The Midas Project Watchtower

@safetychanges.bsky.social

(Removed)

March 5, 2025 at 4:56 AM

The Midas Project Watchtower

@safetychanges.bsky.social

The smaller changes made to Anthropic's practices:

(Added)

March 5, 2025 at 4:56 AM

The Midas Project Watchtower

@safetychanges.bsky.social

Most surprisingly, there is now no record of the former commitments on Anthropic's transparency center, a web resource they launched to track their compliance with voluntary commitments and which they describe as "raising the bar on transparency."

March 5, 2025 at 4:56 AM

The Midas Project Watchtower

@safetychanges.bsky.social

In fact, post-election, multiple tech companies confirmed their commitments hadn't changed.

Perhaps they understood that the commitments were not contingent on whatever way the political winds blow, but made to the public at large.

fedscoop.com/voluntary-ai...

March 5, 2025 at 4:56 AM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: @anthropic.com
Date: February 27th, 2025
Change: Removed "White House's Voluntary Commitments for Safe, Secure, and Trustworthy AI," seemingly without a trace, from their webpage "Transparency Hub" (formerly "Tracking Voluntary Commitments")

Some thoughts in 🧵

March 5, 2025 at 4:56 AM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: xAI
Date: February 10, 2025
Change: Released Risk Management Framework draft
URL: x.ai/documents/20...

xAI's policy is stronger than others in terms of using specific benchmarks, but lacks threshold details, and provides no mitigations.

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Amazon
Date: February 10, 2025
Change: Released their Frontier Model Safety Framework
URL: amazon.science/publications...

Like Microsoft, Amazon's policy also goes through the motions while setting vague thresholds that aren't clearly connected to specific mitigations

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Microsoft
Date: February 8, 2025
Change: Released their Frontier Governance Framework
URL: cdn-dynmedia-1.microsoft.com/is/content/m...

Microsoft's policy is an admirable effort, but as with others, needs further specification. Mitigations should also be connected to specific thresholds

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Cohere
Date: February 7th, 2025
Change: Released their "Secure AI Frontier Model Framework"
URL: cohere.com/security/the...

Cohere's framework mostly neglects the most important risks. Like G42, they are not developing frontier models, which makes this more understandable.

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: G42
Date: February 6, 2025
Change: Released their "Frontier AI Framework"
URL: g42.ai/application/...

G42's policy is surprisingly strong for a non-frontier lab. It's biggest issues are a lack of specificity and not defining future thresholds for catastrophic risks.

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Google
Date: February 4, 2025
Change: Released v2 of their Frontier Safety Framework
URL: deepmind.google/discover/blo...

v2 of the framework improves Google's policy in some areas while weakening it in others, most notably no longer promising to adhere to it if others are not.

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Meta
Date: February 3rd, 2025
Change: Released their "Frontier AI Framework"
URL: ai.meta.com/static-resou...

Meta's policy includes risk thresholds and a commitment to pause development, but is severely weakened by caveats, loopholes, and no mitigations named

February 14, 2025 at 11:17 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Google
Date: February 4, 2024
Change: Removed a ban on using AI technology for warfare and surveillance
URL: ai.google/responsibili...

February 4, 2025 at 11:12 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Worryingly, Google now essentially says that they only plan to follow their framework if other companies are following similar ones.

This would go against the promise they made to the White House, and later to the UK + Korea, to adhere to a risk framework without qualification.

February 4, 2025 at 11:10 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Security mitigations are now connected to risk thresholds. In the previous version of the policy, there was no logical relationship between the two.

February 4, 2025 at 11:10 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Model autonomy risks have been removed and replaced with alignment risks, but the details are vague and yet-to-be fleshed out. The current policy focuses primarily on misuse risks.

February 4, 2025 at 11:10 PM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: OpenAI
Date: January 13, 2024
Change: Adjusted the language on the o1 system card webpage, changing "o1" to "o1-preview."
URL: openai.com/index/openai...

January 14, 2025 at 4:11 AM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Anthropic AI
Date: December 18, 2024
Change: Confusingly, within the last two days, Anthropic changed the "last updated" date on their Responsible Disclosure Policy to a date in June 2024, with no apparent substantive changes to the text of the policy.

December 19, 2024 at 12:52 AM

The Midas Project Watchtower

@safetychanges.bsky.social

Company: Cognition AI
Date: December 10, 2024
Change: Quietly updated terms of service concerning user data. Seemingly did not announce changes, nor have the changes been reflected in the "last updated" variable on their site.
URL: www.cognition.ai/pages/terms-...

December 11, 2024 at 6:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news