Konstantin Pavlov
@kpavlov.me
Human, Software Engineer: AI, Kotlin, OSS.
Creator of mokksy.dev
Creator of mokksy.dev
That was nice 😊 No worries
October 3, 2025 at 8:32 AM
That was nice 😊 No worries
Steve Jobs was alsway behind the mentioned products. But he’s not with Apple any more…
August 3, 2025 at 3:47 PM
Steve Jobs was alsway behind the mentioned products. But he’s not with Apple any more…
Why it matters:
• Parameter sharing (same θ reused across layers) keeps the model small.
• Adaptive depth (different numbers of steps per token) avoids wasting compute on tokens that don’t need much processing.
• Smarter memory usage by caching only what’s needed at each step.
• Parameter sharing (same θ reused across layers) keeps the model small.
• Adaptive depth (different numbers of steps per token) avoids wasting compute on tokens that don’t need much processing.
• Smarter memory usage by caching only what’s needed at each step.
July 22, 2025 at 5:57 AM
Why it matters:
• Parameter sharing (same θ reused across layers) keeps the model small.
• Adaptive depth (different numbers of steps per token) avoids wasting compute on tokens that don’t need much processing.
• Smarter memory usage by caching only what’s needed at each step.
• Parameter sharing (same θ reused across layers) keeps the model small.
• Adaptive depth (different numbers of steps per token) avoids wasting compute on tokens that don’t need much processing.
• Smarter memory usage by caching only what’s needed at each step.
Now, rather than running every input token through f a fixed number of times, it introduces a router, which decides dynamically how many times to apply f for each token, based on how “difficult” it is. The model learns this routing behavior during training.
July 22, 2025 at 5:57 AM
Now, rather than running every input token through f a fixed number of times, it introduces a router, which decides dynamically how many times to apply f for each token, based on how “difficult” it is. The model learns this routing behavior during training.
Think of it like this:
Imagine a fixed-size function f(x; θ) that transforms input x using parameters θ. Instead of stacking many unique layers (with different θs) like in standard Transformers, this model reuses the same function f several times—this is the “recursion” part.
Imagine a fixed-size function f(x; θ) that transforms input x using parameters θ. Instead of stacking many unique layers (with different θs) like in standard Transformers, this model reuses the same function f several times—this is the “recursion” part.
July 22, 2025 at 5:57 AM
Think of it like this:
Imagine a fixed-size function f(x; θ) that transforms input x using parameters θ. Instead of stacking many unique layers (with different θs) like in standard Transformers, this model reuses the same function f several times—this is the “recursion” part.
Imagine a fixed-size function f(x; θ) that transforms input x using parameters θ. Instead of stacking many unique layers (with different θs) like in standard Transformers, this model reuses the same function f several times—this is the “recursion” part.
The paper: arxiv.org/pdf/2507.10524
arxiv.org
July 22, 2025 at 5:51 AM
The paper: arxiv.org/pdf/2507.10524
In a world where every keystroke carries consequences, how far will they go to avoid the sweet taste of failure?
Genres: Workplace Thriller, Tech Drama
This story is: Mind-Bending, Thought-Provoking
Maturity Rating: PG (Mild Technical Jargon, Intense Pair Programming Scenes)
Genres: Workplace Thriller, Tech Drama
This story is: Mind-Bending, Thought-Provoking
Maturity Rating: PG (Mild Technical Jargon, Intense Pair Programming Scenes)
July 19, 2025 at 8:36 AM
In a world where every keystroke carries consequences, how far will they go to avoid the sweet taste of failure?
Genres: Workplace Thriller, Tech Drama
This story is: Mind-Bending, Thought-Provoking
Maturity Rating: PG (Mild Technical Jargon, Intense Pair Programming Scenes)
Genres: Workplace Thriller, Tech Drama
This story is: Mind-Bending, Thought-Provoking
Maturity Rating: PG (Mild Technical Jargon, Intense Pair Programming Scenes)
... a psychological experiment in collective responsibility, technical excellence, and the price of perfection. As commits shrink and pair programming intensifies, the team discovers that their greatest enemy isn't failed tests or broken code, but the comfortable mediocrity they're leaving behind.
July 19, 2025 at 8:36 AM
... a psychological experiment in collective responsibility, technical excellence, and the price of perfection. As commits shrink and pair programming intensifies, the team discovers that their greatest enemy isn't failed tests or broken code, but the comfortable mediocrity they're leaving behind.
You may also get the slides from here: kotlinconf.com/talks/795976/
LangChain4j with Quarkus | KotlinConf 2025, May 21–23, Copenhagen
KotlinConf is the official Kotlin conference by JetBrains. It is a place for the community to gather and discuss all things Kotlin.
kotlinconf.com
June 23, 2025 at 7:32 AM
You may also get the slides from here: kotlinconf.com/talks/795976/
If AI can generate it once, let’s save the planet 🌍🌱 by not asking AI to generate it many times. Same as: if AI can write and run a script, lets’s save the script instead of asking AI to generate it every time.
June 18, 2025 at 11:50 AM
If AI can generate it once, let’s save the planet 🌍🌱 by not asking AI to generate it many times. Same as: if AI can write and run a script, lets’s save the script instead of asking AI to generate it every time.