Sung Kim
@sungkim.bsky.social
6.9K followers 1.1K following 4.9K posts
A business analyst at heart who enjoys delving into AI, ML, data engineering, data science, data analytics, and modeling. My views are my own. You can also find me at threads: @sung.kim.mw
Posts Media Videos Starter Packs
sungkim.bsky.social
New VR (or XR) is coming from Samsung. It will be announced on October 21, 2025 10:00 PM ET.

news.samsung.com/us/samsung-g...
sungkim.bsky.social
Ignore all those people telling you to switch to Linux or macOS. Remember, you don’t like change - and that’s perfectly fine.
sungkim.bsky.social
To everyone still using Windows 10 - keep using it for as long as you like. There are still plenty of people out there running Windows 7, 8, and even older versions.

You’re sticking with Windows 10 because you don’t like change, and honestly, no one’s going to convince you otherwise.
sungkim.bsky.social
Diffusion models are not truly serial models

Diffusion models are:
- Methodologically looks serial (step by step).
- But performing less like a truly serial model (autoregression).

They find that diffusion model solves each problem with the same convergence rate. It will never be a serial model.
sungkim.bsky.social
Pretraining with Hierarchical Memories

They propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge).

Paper: arxiv.org/abs/2510.02375
sungkim.bsky.social
Meta released a paper on Hybrid RL

It offers a promising way to go beyond purely verifiable rewards - combining the reliability of verifier signals with the richness of learned feedback. The results are: +11.7 pts vs RM-only and +9.2 pts vs verifier-only on hard-to-verify reasoning tasks.
sungkim.bsky.social
Vuk Rosić trained 13 LLMs from 0% to 100% attention (rest being DeltaNet linear attention). He found out 17% attention (2 attn layers out of 12) was the best.

github.com/Open-Superin...
sungkim.bsky.social
"Install the Beads binary, tell your agent in AGENTS.md to stop using Markdown and run `bd quickstart`, and your agents will spontaneously get better at everything, particularly long-horizon planning and keeping track of newly discovered work."

github.com/steveyegge/b...
sungkim.bsky.social
@steve-yegge.bsky.social released Beads - A memory upgrade for your coding agent

"It is a magical 4-dimensional graph-based git-backed fairy-dusted issue-tracker database, designed to let coding agents track all your work and never get lost again."
sungkim.bsky.social
ByteDance released the FaceCLIP

A new vision-language model specializing in understanding and generating diverse human faces.

huggingface.co/ByteDance/Fa...
sungkim.bsky.social
@karpathy.bsky.social released the nanochat

"A minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM."
sungkim.bsky.social
→ 1 T total / 50 B active params · 128 K context window
→ Reinforced by Icepop RL + ASystem (Trillion-Scale RL Engine)
→ Open-source SOTA in natural language reasoning — AIME 25 / HMMT 25 / ARC-AGI-1 / CodeForce

huggingface.co/inclusionAI/...
sungkim.bsky.social
Alibaba Ant Group's Ring-1T

Alibaba Ant Group previously released the Ling-1T, which is non-thinking model. Now, it releases the Ring-1T, which is thinking model that achieves silver-level IMO reasoning through pure natural language reasoning.
sungkim.bsky.social
and (iii) higher inference efficiency, with a MIMO formulation that raises arithmetic intensity.

openreview.net/pdf?id=HwCva...
openreview.net
sungkim.bsky.social
MAMBA-3: IMPROVED SEQUENCE MODELING USING STATE SPACE PRINCIPLES

Mamba-3, an SSM model with three axes of improvement rooted in SSM principles: (i) improved quality, via trapezoidal discretization; (ii) new capabilities, through complex SSMs that recover state-tracking;
sungkim.bsky.social
I’m still unclear who will host all of this capacity (Nvidia, AMD, and Broadcom) - unless OpenAI is planning to enter the data-center business itself.
sungkim.bsky.social
So far, OpenAI has deals with 6 companies to provide gigawatts of capacity:

- Microsoft: ??? (???)
- Coreweave: ??? ($11.9B + $4B + $6.5B = $22.4B)
- Oracle: 4.5 GW ($300B)
- Nvidia: 10 GW ($100B)
- AMD: 6 GW (???)
- Broadcom: 10 GW (???)
OpenAI, Broadcom Forge Multibillion-Dollar Chip-Development Deal
The companies plan to deploy 10 gigawatts of custom AI chips over the next four years.
www.wsj.com
sungkim.bsky.social
This will be interesting to watch because, unlike software, hardware advancement can be crippled by something as small as a single screw made in China - as evidenced by Apple’s struggle to manufacture the Mac Pro in the U.S.
sungkim.bsky.social
The newest hot technology is robotics—and social media is already buzzing about a coming global battle for national supremacy.

In this case, the U.S. doesn’t need to impose as many export restrictions, since China already leads in several subfields of robotics.
sungkim.bsky.social
One more thing: it’s being said that Nvidia will maintain a monopoly in training workflows for a foreseeable future.

I’m not sure if AMD’s ROCm advancements are actually real, since I kind of gave up on ROCm a few years ago. It’s not like influencers can be influenced to promote this…
sungkim.bsky.social
It wouldn’t make sense for AMD to have a major partnership with OpenAI and make it a one-and-done deal.

As expected, social media influencers are now hyping up AMD’s recent advancements in ROCm for inference workflows - yeah, AMD finally realized they need to beef up ROCm in 2025.