Past: Google DeepMind.
🇧🇷 in 🇬🇧
Also available for Android and iOS as of today
mistral.ai/en/news/all-...
Also available for Android and iOS as of today
mistral.ai/en/news/all-...
- 24B params, 81% MMLU
- Latency optimized: 150 tokens/s
- Competitive with Llama-3.3 70B, Qwen-2.5 32B, GPT4o-mini
- Apache 2.0
mistral.ai/news/mistral...
- 24B params, 81% MMLU
- Latency optimized: 150 tokens/s
- Competitive with Llama-3.3 70B, Qwen-2.5 32B, GPT4o-mini
- Apache 2.0
mistral.ai/news/mistral...
Also covers variants like non-Euclidean & discrete flow matching.
A PyTorch library is also released with this guide!
This looks like a very good read! 🔥
arxiv: arxiv.org/abs/2412.06264
Also covers variants like non-Euclidean & discrete flow matching.
A PyTorch library is also released with this guide!
This looks like a very good read! 🔥
arxiv: arxiv.org/abs/2412.06264
They recently published a video on "Building Machine Learning Systems for a Trillion Trillion Floating Point Operations".
Link: www.youtube.com/watch?v=139U...
They recently published a video on "Building Machine Learning Systems for a Trillion Trillion Floating Point Operations".
Link: www.youtube.com/watch?v=139U...
AI Ads: here is a technology that will automate spending time with your kids
AI Ads: here is a technology that will automate spending time with your kids
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
pdf ❌
abs ✅
pdf ❌
abs ✅
Evidence of leadership.
www.forbes.com/sites/carlto...
Evidence of leadership.
www.forbes.com/sites/carlto...
- New Le Chat: With canvas, web search, image understanding and generation & more - and free!
- Pixtral Large, our Frontier 124B open weight multimodal model that powers it.
Try it: chat.mistral.ai
Blog post: mistral.ai/news/mistral...
- New Le Chat: With canvas, web search, image understanding and generation & more - and free!
- Pixtral Large, our Frontier 124B open weight multimodal model that powers it.
Try it: chat.mistral.ai
Blog post: mistral.ai/news/mistral...
Here's my latest blog post for good measure, about how diffusion models of images perform autoregression in frequency space: sander.ai/2024/09/02/s...
When I write more, I'll share here as well!
Here's my latest blog post for good measure, about how diffusion models of images perform autoregression in frequency space: sander.ai/2024/09/02/s...
When I write more, I'll share here as well!
The intuition is that the model quickly learns to not attend across [SEP] boundaries and packing avoids "wasting" compute on padding tokens required to make the variable batch size consistent.
Just FYI, we're hiring AI Scientists and Engineers at Mistral AI.
If you're driven and interested in building cutting-edge GenAI, we'd love to have you join our team.
🌐 Check out our openings: jobs.lever.co/mistral
#AIJobs #TechCareers #MistralAI
Just FYI, we're hiring AI Scientists and Engineers at Mistral AI.
If you're driven and interested in building cutting-edge GenAI, we'd love to have you join our team.
🌐 Check out our openings: jobs.lever.co/mistral
#AIJobs #TechCareers #MistralAI
The largest open-source Transformer-based MoE model with 389 billion parameters, can handle up to 256K tokens. Key features include large-scale synthetic data and a mixed expert routing strategy.
Model: huggingface.co/tencent/Tenc...
Paper: arxiv.org/abs/2411.02265
The largest open-source Transformer-based MoE model with 389 billion parameters, can handle up to 256K tokens. Key features include large-scale synthetic data and a mixed expert routing strategy.
Model: huggingface.co/tencent/Tenc...
Paper: arxiv.org/abs/2411.02265
- Yes, Nvidia is likely to maintain dominance in the market for training AI models.
- No, another company (or companies) will take the lead in the market for AI model inference, which is an exponentially larger market.
Runway introduces advanced camera control for Gen-3 Alpha Turbo. Choose both the direction and intensity of how you move through your scenes for even more intention in every shot.
Runway introduces advanced camera control for Gen-3 Alpha Turbo. Choose both the direction and intensity of how you move through your scenes for even more intention in every shot.
I want to highlight progress we made in understanding the role of tokenization, developing the core incidents and mitigating its problems. 🧵👇
I want to highlight progress we made in understanding the role of tokenization, developing the core incidents and mitigating its problems. 🧵👇