David van Dijk
banner
vandijklab.bsky.social
David van Dijk
@vandijklab.bsky.social
Learning the rules of life.
Assistant Professor of Medicine and Computer Science @ Yale
We thank our amazing team at Yale, Google Research, and Google DeepMind
April 18, 2025 at 2:14 PM
Beyond standard training, we used Reinforcement Learning (RL) 🤖 to fine-tune C2S-Scale.
Using GRPO + biological rewards, we specifically improved:
• Perturbation prediction accuracy 🧪
• Biological Q&A relevance ❓
Aligning LLMs with biological goals! ✅
April 18, 2025 at 2:14 PM
Size matters! 📈 We observed clear scaling laws: As model size increased from 410M → 27 Billion parameters, performance consistently improved across tasks.
This confirms that LLMs learn better biological representations at scale using the C2S approach. Even works with efficient LoRA tuning! 💪
April 18, 2025 at 2:14 PM
And it works! 🎉 C2S-Scale achieves SOTA performance, surpassing specialized single-cell models AND general LLMs:
• 🎯 Cell type annotation
• 🧪 Predicting perturbation responses
• ✍️ Generating dataset summaries from cells
• 🗺️ Inferring spatial relationships
• ❓ Answering complex biological questions
April 18, 2025 at 2:14 PM
To truly "teach" biology to LLMs, we built a massive corpus: Over 1 BILLION tokens! 📚
This wasn't just cell sentences – it included:
• 🧬 50M+ cell profiles (human/mouse)
• 🏷️ Annotations & Metadata
• 📄 Biological Text (abstracts, etc.)
Result? One model, many tasks!
April 18, 2025 at 2:14 PM
We enable LLMs to "read" biology via Cell2Sentence (C2S) 🧬➡️📝: ranking genes creates text.
This lets us leverage massive pre-trained models, unifying transcriptomic data with biological text (annotations, papers) for richer understanding.
April 18, 2025 at 2:14 PM
This work highlights the power of CLM-based intelligent adaptive solvers for scalable operator learning of dynamical systems. Imagine more efficient and accurate simulations for everything from fluid dynamics to climate modeling! 🌍
February 13, 2025 at 7:23 PM
📈 Benchmarked on diverse systems, COAST consistently outperforms state-of-the-art methods in both accuracy and efficiency!
February 13, 2025 at 7:23 PM
🔑 Key finding: COAST generates variable step sizes that intelligently adapt to the current system's complexity! Smaller steps in complex regions, larger steps in simpler ones. Across systems, more complex dynamics get finer time resolution.
February 13, 2025 at 7:23 PM
Current ML methods for PDEs often use fixed time steps, which is inefficient, especially for complex dynamics. COAST, powered by a causal language model (CLM), predicts both the solution AND the optimal time step. 🧠
February 13, 2025 at 7:23 PM
Excited to share our new preprint: COAST: Intelligent Time-Adaptive Neural Operators! 🌊 We introduce a novel neural operator that learns to dynamically and intelligently adjust time step sizes for modeling dynamical systems from data. 🚀 doi.org/10.48550/arX...
February 13, 2025 at 7:23 PM
🔥🧠🌌 Now accepted at #ICLR2025 !

How does complexity shape intelligence? 🤔

In our new paper "Intelligence at the Edge of Chaos", we explore the relationship between complex systems and the emergence of intelligence in AI models. Can complexity alone unlock smarter systems?
arxiv.org/abs/2410.02536
February 12, 2025 at 8:38 AM