Lightnews — Scholar-powered news

Derek Lewis

@dlewis.io

Having a debugger in your emulator is helpful. Currently, working on a SPARCstation 5/sun4m emulator as a side project in pure Python that can boot into OpenBoot, and finally to the memory bank probe. I've had lots of issues with the iommu, but slowly working through them.

November 16, 2025 at 7:56 PM

Derek Lewis

@dlewis.io

Writing emulators is a good way to learn about hardware, which is something that I haven't spent much time on previously. Just finished up a basic Chip8 emulator in Python that can do instruction decoding and B/W screen drawing. TBDs still include keyboard input & sound. github.com/derekelewis/...

GitHub - derekelewis/Chip8

Contribute to derekelewis/Chip8 development by creating an account on GitHub.

github.com

November 11, 2025 at 10:57 PM

Derek Lewis

@dlewis.io

Having a hard time seeing the difference between Tinted and Clear for Liquid Glass with macOS 26.1 in Safari.

November 3, 2025 at 10:54 PM

Derek Lewis

@dlewis.io

You probably wouldn't know it from this top output, but I have a FSDP training run going on the DGX Spark cluster. No wasted CPU time spent processing interrupts or copying between buffers. RDMA networking is a wonderful thing.

October 31, 2025 at 10:45 PM

Derek Lewis

@dlewis.io

2x performance by adding the 2nd DGX Spark w/ the 200GbE interconnect to a distributed training run with Karpathy's nanochat. Brings base training down from 10 days to 5 days. Token throughput is 4x compared to single node run, but only because grad accumulated steps changed from 8 to 4.

October 31, 2025 at 8:48 PM

Derek Lewis

@dlewis.io

200GbE network is up and running between the DGX Sparks. Having a high throughput cluster on a desk that consumes less than 400W of power under full load is awesome. NCCL benchmarks show near line-speed for AllGather.

October 31, 2025 at 4:32 PM

Derek Lewis

@dlewis.io

Waiting for a 200GbE interconnect cable to come in to connect my NVIDIA DGX Sparks. Did some NCCL connectivity and validation testing with the 10GbE ports in the meantime:

October 30, 2025 at 10:37 PM

Derek Lewis

@dlewis.io

NVIDIA DGX Spark #2 is up and running.

October 30, 2025 at 7:20 PM

Derek Lewis

@dlewis.io

Womp womp - looks like NVIDIA NIM images aren't updated to CUDA 13.1, yet. That means no NIM on the DGX Spark for the time being except for a few custom images they have done. Unfortunate, because I really wanted to see mxfp4 & trt-llm w/ gpt-oss-120b.

October 29, 2025 at 5:40 PM

Derek Lewis

@dlewis.io

Long context llama.cpp testing with the NVIDIA DGX Spark & gpt-oss-120b.

October 28, 2025 at 7:03 PM

Derek Lewis

@dlewis.io

For anyone that is curious @karpathy.bsky.social's nanochat takes around 10 days for base training on a NVIDIA DGX Spark (~1600 tok/s). Will benchmark again when I get the 2nd DGX to see how linear the scaling is.

October 27, 2025 at 12:00 PM

Derek Lewis

@dlewis.io

NVIDIA DGX Spark is up and running. Setup process was seamless. Now for some fine-tuning and CUDA development.

October 27, 2025 at 2:11 AM

Derek Lewis

@dlewis.io

Tried to make the switch to Chrome again from Safari. Passwords integration in Safari is the issue, and the Chrome plugin isn't great. Was a 1Password customer for years, but family is now fully on Passwords.

October 19, 2025 at 3:36 PM

Derek Lewis

@dlewis.io

Made the plunge and ordered a DGX Spark. Less interested in the inferencing performance and more interested in having the full Nvidia DGX stack on my desk for development.

October 17, 2025 at 9:10 PM

Derek Lewis

@dlewis.io

Somehow just discovered @netnewswire.com and using it as my RSS reader going forward. There's something to be said for an app that is just an app and not a service.

October 12, 2025 at 12:32 AM

Derek Lewis

@dlewis.io

Experimenting to see if I can use scheduled tasks in ChatGPT & Gemini to replace my RSS reader agent that scrapes blogs, summarizes, and publishes via webhook.

October 4, 2025 at 6:12 PM

Derek Lewis

@dlewis.io

Native containers in macOS 26 are lightweight & functional. No more Docker or Podman VMs required.

July 9, 2025 at 9:55 PM

Derek Lewis

@dlewis.io

Worked with a customer on LLM infra sizing—here’s a deep dive on llama-3.3-70b-instruct inference using NVIDIA NIM.

H100 (SXM5) delivered up to 14× more throughput vs A100 (PCIe) with far lower latency.

Full benchmarks + thoughts:

dlewis.io/evaluating-l...

Evaluating Llama‑3.3‑70B Inference on NVIDIA H100 and A100 GPUs

Large‑scale language models quickly expose the limits of yesterday’s hardware. To understand how much practical head‑room Hopper offers over Ampere in a production‑style setting, I profiled llama-3.3-...

dlewis.io

April 17, 2025 at 6:27 PM

Derek Lewis

@dlewis.io

Had to remind myself today that bfloat16 on Apple Silicon in PyTorch with AMP provides a minimal performance increase for model training or inferencing. It is very beneficial on NVIDIA GPUs because of Tensor Cores, which PyTorch uses for bfloat16 matmuls.

April 16, 2025 at 10:26 PM

Derek Lewis

@dlewis.io

Wanted to share some of my recent experiences debugging a real-world problem with LLMs. Problem complexity is an issue for some models. Reasoning models fare better. dlewis.io/recent-exper...

Recent Experiences Debugging with LLMs

I’m frequently asked by clients what my thoughts are on LLMs and coding. Personal experience has informed me that LLMs cannot solve problems of a certain complexity for a number of reasons. One of the...

dlewis.io

April 16, 2025 at 8:17 PM

Derek Lewis

@dlewis.io

While fixing a KV Cache generation bug today in the MLX GPT-2 implementation that I submitted last year, I discovered that the gpt2 (128M) model is much more dependent on positional encodings than the larger gpt2-xl (1.5B). Guess that explains why linear positional encoding layers were dropped.

April 14, 2025 at 2:04 AM

Derek Lewis

@dlewis.io

Qwen2.5 models are exceptionally strong at tool calling for their size. Definitely stronger than the Llama 3.1/3.2 models.

March 18, 2025 at 2:21 AM

Derek Lewis

@dlewis.io

We’re excited to announce the open sourcing of our AI Foundry Starter Template at Silex Data! This production-ready starter kit empowers you to build and deploy AI apps with LangChainAI/LangGraph, featuring streaming chat, robust Keycloak authentication, Kong's multi-model gateway, and OpenShift.

March 12, 2025 at 7:36 PM

Derek Lewis

@dlewis.io

Recently, I wanted to experiment with some algorithmic trading. Building the Interactive Brokers C++ API client library on macOS & Linux/aarch64 had a few more barriers than I anticipated. Wrote up a brief blog post with the steps. dlewis.io/ibkr-cpp-api/

Building the IBKR C++ API Client Library

Recently, I wanted to use the C++ API client library that Interactive Brokers provides and experiment with some algorithmitic trading and monitoring of my positions. I had hoped there would be some pr...

dlewis.io

February 11, 2025 at 11:16 PM