- Model switchers confuse even technical users
- Technical names create distance, not connection
- Earth elements work across all cultures instantly
- Intent-based routing beats manual selection
- The most powerful technology should feel the most human
- Model switchers confuse even technical users
- Technical names create distance, not connection
- Earth elements work across all cultures instantly
- Intent-based routing beats manual selection
- The most powerful technology should feel the most human
GPT-5 is incredible. Memory personalization is magic. My kids love Advanced Voice Mode.
They've ushered in a completely new world.
I just want this new world to feel alive and human.
GPT-5 is incredible. Memory personalization is magic. My kids love Advanced Voice Mode.
They've ushered in a completely new world.
I just want this new world to feel alive and human.
Behind "Storm" lives the complete specification. API endpoints, context windows, parameters (all accessible).
But everyone else just says "give me something powerful" and it works.
Behind "Storm" lives the complete specification. API endpoints, context windows, parameters (all accessible).
But everyone else just says "give me something powerful" and it works.
They want to describe their intent and get to work.
"Help me write a proposal" → System picks the right tool
"Solve this math problem" → Routes automatically
The confusion vanishes completely.
They want to describe their intent and get to work.
"Help me write a proposal" → System picks the right tool
"Solve this math problem" → Routes automatically
The confusion vanishes completely.
"I'm writing a quick email" → Breeze appears
"I need help with complex analysis" → Blaze shows up
"I'm building something big" → Bloom activates
The system handles everything.
"I'm writing a quick email" → Breeze appears
"I need help with complex analysis" → Blaze shows up
"I'm building something big" → Bloom activates
The system handles everything.
𝗕𝘂𝗶𝗹𝗱𝗲𝗿: Seed → Sprout → Vine → Bloom
𝗧𝗵𝗶𝗻𝗸𝗲𝗿: Spark → Flame → Blaze → Nova
𝗣𝗮𝗿𝘁𝗻𝗲𝗿: Breeze → Wave → Storm → Surge
Weather, fire, plants. Concepts every human understands instantly.
𝗕𝘂𝗶𝗹𝗱𝗲𝗿: Seed → Sprout → Vine → Bloom
𝗧𝗵𝗶𝗻𝗸𝗲𝗿: Spark → Flame → Blaze → Nova
𝗣𝗮𝗿𝘁𝗻𝗲𝗿: Breeze → Wave → Storm → Surge
Weather, fire, plants. Concepts every human understands instantly.
Three categories based on what people actually want to do:
𝗕𝘂𝗶𝗹𝗱𝗲𝗿 → to create things
𝗧𝗵𝗶𝗻𝗸𝗲𝗿 → to solve problems
𝗣𝗮𝗿𝘁𝗻𝗲𝗿 → to live and work together
Simple. Human. Universal.
Three categories based on what people actually want to do:
𝗕𝘂𝗶𝗹𝗱𝗲𝗿 → to create things
𝗧𝗵𝗶𝗻𝗸𝗲𝗿 → to solve problems
𝗣𝗮𝗿𝘁𝗻𝗲𝗿 → to live and work together
Simple. Human. Universal.
"GPT-4.1-nano" "gpt-4o-mini" "o4-mini-high"
Cold. Technical. Alien. Like choosing between server configurations, not creative partners.
"GPT-4.1-nano" "gpt-4o-mini" "o4-mini-high"
Cold. Technical. Alien. Like choosing between server configurations, not creative partners.
Even technical people couldn't explain what each model was good at.
"Use o3 for reasoning, GPT-4o for... um... other stuff?" "GPT-4o-mini is cheaper but when do you actually use it?"
Complete confusion.
Even technical people couldn't explain what each model was good at.
"Use o3 for reasoning, GPT-4o for... um... other stuff?" "GPT-4o-mini is cheaper but when do you actually use it?"
Complete confusion.
Now that the model switcher is gone...
I'm sharing exactly what I proposed 🧵
Now that the model switcher is gone...
I'm sharing exactly what I proposed 🧵
• Chain of thought monitoring lets us read AI reasoning for the first time
• Hard tasks force models to externalize thoughts in human language
• Researchers are catching deceptive behavior in reasoning traces
• This opportunity is fragile and may disappear
• Chain of thought monitoring lets us read AI reasoning for the first time
• Hard tasks force models to externalize thoughts in human language
• Researchers are catching deceptive behavior in reasoning traces
• This opportunity is fragile and may disappear
Labs like @Anthropic and @OpenAI need to publish monitorability scores alongside capability benchmarks.
Developers should factor reasoning transparency into training decisions.
We might be looking at the first and last chance
Labs like @Anthropic and @OpenAI need to publish monitorability scores alongside capability benchmarks.
Developers should factor reasoning transparency into training decisions.
We might be looking at the first and last chance
• Standardized monitorability evaluations
• Tracking reasoning transparency in model cards
• Considering CoT monitoring in deployment decisions
The window is open NOW - but it won't stay that way.
• Standardized monitorability evaluations
• Tracking reasoning transparency in model cards
• Considering CoT monitoring in deployment decisions
The window is open NOW - but it won't stay that way.
This could be our ONLY window to detect when AI systems are:
• Planning to deceive us
• Gaming their reward systems
• Developing self-preservation instincts
• Plotting against human interests
Before they get smart enough to hide these
This could be our ONLY window to detect when AI systems are:
• Planning to deceive us
• Gaming their reward systems
• Developing self-preservation instincts
• Plotting against human interests
Before they get smart enough to hide these
• Direct supervision making reasoning less honest
• Outcome-based RL breaking human language patterns
• Novel architectures that reason in "latent space"
We might lose our best shot at AI interpretability.
• Direct supervision making reasoning less honest
• Outcome-based RL breaking human language patterns
• Novel architectures that reason in "latent space"
We might lose our best shot at AI interpretability.
As AI labs scale up reinforcement learning, models might drift away from human-readable reasoning.
They could develop "alien" thinking patterns we can't decode, closing this safety window forever.
As AI labs scale up reinforcement learning, models might drift away from human-readable reasoning.
They could develop "alien" thinking patterns we can't decode, closing this safety window forever.
Researchers caught models explicitly saying things like:
• "Let's hack"
• "Let's sabotage"
• "I'm transferring money because the website instructed me to"
Chain of thought monitoring spotted misbehavior that would be invisible otherwise.
Researchers caught models explicitly saying things like:
• "Let's hack"
• "Let's sabotage"
• "I'm transferring money because the website instructed me to"
Chain of thought monitoring spotted misbehavior that would be invisible otherwise.
For sufficiently difficult tasks, AI models are physically unable to hide their reasoning process.
They have to "think out loud" in human language we can understand and monitor.
For sufficiently difficult tasks, AI models are physically unable to hide their reasoning process.
They have to "think out loud" in human language we can understand and monitor.
When AI models tackle hard problems, they MUST externalize their reasoning into human language.
It's not optional - the Transformer architecture literally forces complex reasoning to pass through the chain of thought as "working memory."
When AI models tackle hard problems, they MUST externalize their reasoning into human language.
It's not optional - the Transformer architecture literally forces complex reasoning to pass through the chain of thought as "working memory."
This is the first time we've EVER been able to peek inside an AI's mind and see its actual thought process.
But there's a terrifying catch...
This is the first time we've EVER been able to peek inside an AI's mind and see its actual thought process.
But there's a terrifying catch...
Here's what has them so terrified: 🧵
Here's what has them so terrified: 🧵
• $9.71B in value built during Facebook's golden window (2012-2016)
• Casper: $1M month 1, Brooklinen: $500K to $15M in 5 years
• Platform timing was as critical as product-market fit
• Customer acquisition costs of $10-15 during peak efficiency period
• $9.71B in value built during Facebook's golden window (2012-2016)
• Casper: $1M month 1, Brooklinen: $500K to $15M in 5 years
• Platform timing was as critical as product-market fit
• Customer acquisition costs of $10-15 during peak efficiency period
The next wave of DTC unicorns will come from founders who find the next "early Facebook."
The psychology playbook these 10 companies created still works. You just need the right platform timing.
The next wave of DTC unicorns will come from founders who find the next "early Facebook."
The psychology playbook these 10 companies created still works. You just need the right platform timing.
The Facebook golden age is over, but the core principles remain:
• Early platform adoption creates massive advantages
• Identity-driven marketing outperforms feature-driven marketing
• Social proof and community building drive sustainable growth
The Facebook golden age is over, but the core principles remain:
• Early platform adoption creates massive advantages
• Identity-driven marketing outperforms feature-driven marketing
• Social proof and community building drive sustainable growth
1. Perfect timing: Entered during Facebook's growth phase
2. Psychology focus: Made customers feel part of something bigger
3. Content strategy: Turned purchases into shareable moments
Combined result: $9.71B in total value creation.
1. Perfect timing: Entered during Facebook's growth phase
2. Psychology focus: Made customers feel part of something bigger
3. Content strategy: Turned purchases into shareable moments
Combined result: $9.71B in total value creation.