You can also find me at threads: @sung.kim.mw
The learning path author have been using to upskill self and be ready to work on biology and healthcare problems with Machine Learning.
Mastering PyTorch: From Linear Regression to Computer Vision: www.iamtk.co/mastering-py...
The learning path author have been using to upskill self and be ready to work on biology and healthcare problems with Machine Learning.
Mastering PyTorch: From Linear Regression to Computer Vision: www.iamtk.co/mastering-py...
Learn how a transformer converts input tokens into context-aware representations and, ultimately, next-token probabilities.
machinelearningmastery.com/the-journey-...
Learn how a transformer converts input tokens into context-aware representations and, ultimately, next-token probabilities.
machinelearningmastery.com/the-journey-...
A guide on prompting Nano Banana Pro
www.fofr.ai/nano-banana-...
A guide on prompting Nano Banana Pro
www.fofr.ai/nano-banana-...
SSA framework for long-context inference achieves state-of-the-art by explicitly encouraging sparser attention distributions, outperforming existing methods in perplexity across huge context windows
Paper: arxiv.org/abs/2511.20102
SSA framework for long-context inference achieves state-of-the-art by explicitly encouraging sparser attention distributions, outperforming existing methods in perplexity across huge context windows
Paper: arxiv.org/abs/2511.20102
They found that vanilla SGD is
1. As performant as AdamW,
2. 36x more parameter efficient naturally. (much more than a rank 1 lora)
"Who is Adam? SGD Might Be All We Need For RLVR In LLMs"
www.notion.so/sagnikm/Who-...
They found that vanilla SGD is
1. As performant as AdamW,
2. 36x more parameter efficient naturally. (much more than a rank 1 lora)
"Who is Adam? SGD Might Be All We Need For RLVR In LLMs"
www.notion.so/sagnikm/Who-...
To understand how prompt caching works, we will also need to look at basics of inference engine like vLLM and subsequently how kv-cache re-use is implemented.
sankalp.bearblog.dev/how-prompt-c...
To understand how prompt caching works, we will also need to look at basics of inference engine like vLLM and subsequently how kv-cache re-use is implemented.
sankalp.bearblog.dev/how-prompt-c...
They have optimized and fine-tuned Whisper models to handle arbitrary audio chunks and compressed them with ANNA, and added streaming inference support for both Apple's M processor and Nvidia GPUs (e.g., L40).
They have optimized and fine-tuned Whisper models to handle arbitrary audio chunks and compressed them with ANNA, and added streaming inference support for both Apple's M processor and Nvidia GPUs (e.g., L40).
- Dimension-robust orthogonalization via adaptive Newton iterations with size-aware coefficients
- Optimization-robust updates using proximal methods that dampen harmful outliers while preserving useful gradient
- Dimension-robust orthogonalization via adaptive Newton iterations with size-aware coefficients
- Optimization-robust updates using proximal methods that dampen harmful outliers while preserving useful gradient
How to effectively and efficiently learn a low-dimensional distribution of data in a high-dimensional space and then transform the distribution to a compact and structured representation?
How to effectively and efficiently learn a low-dimensional distribution of data in a high-dimensional space and then transform the distribution to a compact and structured representation?
CLIP is a popular method for learning multimodal latent spaces with well-organized semantics. Despite its wide range of applications, CLIP's latent space is known to fail at handling complex visual-textual interactions.
CLIP is a popular method for learning multimodal latent spaces with well-organized semantics. Despite its wide range of applications, CLIP's latent space is known to fail at handling complex visual-textual interactions.
At the core of the attention mechanism in LLMs are three matrices: Query, Key, and Value. These matrices are how transformers actually pay attention to different parts of the input.
At the core of the attention mechanism in LLMs are three matrices: Query, Key, and Value. These matrices are how transformers actually pay attention to different parts of the input.
Modern organizations exert control by maximising “legibility”: by altering the system so that all parts of it can be measured, reported on, and so on.
Modern organizations exert control by maximising “legibility”: by altering the system so that all parts of it can be measured, reported on, and so on.
Intel Is Reportedly Poaching TSMC Arizona Engineers With 20–30% Higher Salaries while providing a workload that is roughly half as heavy.
Intel Is Reportedly Poaching TSMC Arizona Engineers With 20–30% Higher Salaries while providing a workload that is roughly half as heavy.
Parallel decoding is a fight to increase speed while maintaining fluency and diversity, especially with the proliferation of diffusion language models.
danielmisrael.github.io/posts/2025/1...
Parallel decoding is a fight to increase speed while maintaining fluency and diversity, especially with the proliferation of diffusion language models.
danielmisrael.github.io/posts/2025/1...
- Test-Time Compute Scaling
- Deep Audio Comprehension
- Real-time responsiveness
- Scalable chain-of-thought reasoning for audio tasks
Comparable to Gemini 3 across major audio reasoning tasks.
huggingface.co/stepfun-ai/S...
- Test-Time Compute Scaling
- Deep Audio Comprehension
- Real-time responsiveness
- Scalable chain-of-thought reasoning for audio tasks
Comparable to Gemini 3 across major audio reasoning tasks.
huggingface.co/stepfun-ai/S...
kimi.com/slides
kimi.com/slides
You could even argue that crypto is helping drive gold’s price higher.
You could even argue that crypto is helping drive gold’s price higher.
It's because traditional software engineering is Deterministic while agent engineering is Probabilistic. The more senior the engineer, the less they tend to trust the reasoning and instruction-following capabilities of the Agent.
It's because traditional software engineering is Deterministic while agent engineering is Probabilistic. The more senior the engineer, the less they tend to trust the reasoning and instruction-following capabilities of the Agent.