Lightnews — Scholar-powered news

llm-d

@llm-d.ai

llm-d is a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale.

Learn more at: https://llm-d.ai

Posts Replies Media Videos

llm-d

@llm-d.ai

A huge shoutout to the contributors in SIG-benchmarking for making performance transparency a core pillar of the llm-d project!

🚀 Check out the full demo here: youtu.be/TNYXjZpLCN4

#AI #Kubernetes #Benchmarking

Community Demo: Verified & Reproducible LLM Benchmarks | llm-d Project

In the llm-d open-source project, we believe a supported guide is only as good as the data backing it. In this community demo, the SIG-benchmarking team showcases the benchmarking suite that brings…

youtu.be

January 19, 2026 at 8:13 PM

llm-d

@llm-d.ai

⚫ 100% Reproducibility: We aim for a world where if you see a benchmark in an llm-d blog post, you can run the exact same template on your cluster and see the same results. Transparency is key to scaling AI.

January 19, 2026 at 8:13 PM

llm-d

@llm-d.ai

Why does this matter for the community?

⚫ Verified, Not Just Documented: Every community-tested guide is now backed by standardized benchmarking templates.

If the guide says it performs, we provide the tools to prove it.

January 19, 2026 at 8:13 PM

llm-d

@llm-d.ai

This new contribution allows anyone to benchmark a pre-existing or pre-installed stack. It is specifically designed for stacks deployed via official llm-d guides to ensure your setup matches our verified community baselines.

January 19, 2026 at 8:13 PM

llm-d

@llm-d.ai

In our latest community demo, the SIG-benchmarking team showcases their benchmarking suite that brings verified performance standards directly to your local environment. No more guessing if your stack is optimized.

January 19, 2026 at 8:13 PM

Reposted by llm-d

Yuan Tang

@terrytangyuan.xyz

If you see me around the hallway or at the sessions, I’d love to chat about:
- Model inference (KServe, vLLM, @llm-d.ai)
- @kubernetes.io AI Conformance Program
- @kubefloworg.bsky.social & @argoproj.bsky.social
- @cncf.io TAG Workloads Foundation
- Open source, cloud-native, AI infra and systems

January 15, 2026 at 5:06 PM

llm-d

@llm-d.ai

Check out our updated guide on leveraging tiered caching in your own cluster: llm-d.ai/docs/guide/I...

Up next: A deep dive blog on deployment patterns and scheduling behavior. Stay tuned! ⚡️

Prefix Cache Offloading - CPU | llm-d

Well-lit path for separating prefill and decode operations

llm-d.ai

January 9, 2026 at 6:45 PM

llm-d

@llm-d.ai

By separating memory transfer mechanisms from global scheduling logic, llm-d ensures you get the best of both: peak engine performance + optimal resource utilization across the entire fleet. 🛠️

January 9, 2026 at 6:45 PM

llm-d

@llm-d.ai

How we’re using it:

⚫️ Tiered-Prefix-Cache: We use the new connector to bridge GPU HBM and CPU RAM, creating a massive, multi-tier cache hierarchy.

⚫️ Intelligent Scheduling: Our scheduler now routes requests to pods where KV blocks are already warm (in GPU or CPU).

January 9, 2026 at 6:45 PM

llm-d

@llm-d.ai

Our mission with llm-d is building the control plane that translates these engine-level wins into cluster-wide performance.

We’ve already integrated these capabilities into our core architecture to bridge the gap between raw hardware power and distributed scale.

January 9, 2026 at 6:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news