Derek Lewis
dlewis.io
Derek Lewis
@dlewis.io
CTO & Data Scientist at Silex Data Solutions & CODEHR.ai. Opinions expressed are my own.
Having a debugger in your emulator is helpful. Currently, working on a SPARCstation 5/sun4m emulator as a side project in pure Python that can boot into OpenBoot, and finally to the memory bank probe. I've had lots of issues with the iommu, but slowly working through them.
November 16, 2025 at 7:56 PM
Writing emulators is a good way to learn about hardware, which is something that I haven't spent much time on previously. Just finished up a basic Chip8 emulator in Python that can do instruction decoding and B/W screen drawing. TBDs still include keyboard input & sound. github.com/derekelewis/...
GitHub - derekelewis/Chip8
Contribute to derekelewis/Chip8 development by creating an account on GitHub.
github.com
November 11, 2025 at 10:57 PM
Having a hard time seeing the difference between Tinted and Clear for Liquid Glass with macOS 26.1 in Safari.
November 3, 2025 at 10:54 PM
You probably wouldn't know it from this top output, but I have a FSDP training run going on the DGX Spark cluster. No wasted CPU time spent processing interrupts or copying between buffers. RDMA networking is a wonderful thing.
October 31, 2025 at 10:45 PM
2x performance by adding the 2nd DGX Spark w/ the 200GbE interconnect to a distributed training run with Karpathy's nanochat. Brings base training down from 10 days to 5 days. Token throughput is 4x compared to single node run, but only because grad accumulated steps changed from 8 to 4.
October 31, 2025 at 8:48 PM
200GbE network is up and running between the DGX Sparks. Having a high throughput cluster on a desk that consumes less than 400W of power under full load is awesome. NCCL benchmarks show near line-speed for AllGather.
October 31, 2025 at 4:32 PM
Waiting for a 200GbE interconnect cable to come in to connect my NVIDIA DGX Sparks. Did some NCCL connectivity and validation testing with the 10GbE ports in the meantime:
October 30, 2025 at 10:37 PM
NVIDIA DGX Spark #2 is up and running.
October 30, 2025 at 7:20 PM
Womp womp - looks like NVIDIA NIM images aren't updated to CUDA 13.1, yet. That means no NIM on the DGX Spark for the time being except for a few custom images they have done. Unfortunate, because I really wanted to see mxfp4 & trt-llm w/ gpt-oss-120b.
October 29, 2025 at 5:40 PM
Long context llama.cpp testing with the NVIDIA DGX Spark & gpt-oss-120b.
October 28, 2025 at 7:03 PM
For anyone that is curious @karpathy.bsky.social's nanochat takes around 10 days for base training on a NVIDIA DGX Spark (~1600 tok/s). Will benchmark again when I get the 2nd DGX to see how linear the scaling is.
October 27, 2025 at 12:00 PM
NVIDIA DGX Spark is up and running. Setup process was seamless. Now for some fine-tuning and CUDA development.
October 27, 2025 at 2:11 AM
Tried to make the switch to Chrome again from Safari. Passwords integration in Safari is the issue, and the Chrome plugin isn't great. Was a 1Password customer for years, but family is now fully on Passwords.
October 19, 2025 at 3:36 PM
Made the plunge and ordered a DGX Spark. Less interested in the inferencing performance and more interested in having the full Nvidia DGX stack on my desk for development.
October 17, 2025 at 9:10 PM
Somehow just discovered @netnewswire.com and using it as my RSS reader going forward. There's something to be said for an app that is just an app and not a service.
October 12, 2025 at 12:32 AM
Experimenting to see if I can use scheduled tasks in ChatGPT & Gemini to replace my RSS reader agent that scrapes blogs, summarizes, and publishes via webhook.
October 4, 2025 at 6:12 PM
Native containers in macOS 26 are lightweight & functional. No more Docker or Podman VMs required.
July 9, 2025 at 9:55 PM
Worked with a customer on LLM infra sizing—here’s a deep dive on llama-3.3-70b-instruct inference using NVIDIA NIM.

H100 (SXM5) delivered up to 14× more throughput vs A100 (PCIe) with far lower latency.

Full benchmarks + thoughts:

dlewis.io/evaluating-l...
Evaluating Llama‑3.3‑70B Inference on NVIDIA H100 and A100 GPUs
Large‑scale language models quickly expose the limits of yesterday’s hardware. To understand how much practical head‑room Hopper offers over Ampere in a production‑style setting, I profiled llama-3.3-...
dlewis.io
April 17, 2025 at 6:27 PM
Had to remind myself today that bfloat16 on Apple Silicon in PyTorch with AMP provides a minimal performance increase for model training or inferencing. It is very beneficial on NVIDIA GPUs because of Tensor Cores, which PyTorch uses for bfloat16 matmuls.
April 16, 2025 at 10:26 PM
Wanted to share some of my recent experiences debugging a real-world problem with LLMs. Problem complexity is an issue for some models. Reasoning models fare better. dlewis.io/recent-exper...
Recent Experiences Debugging with LLMs
I’m frequently asked by clients what my thoughts are on LLMs and coding. Personal experience has informed me that LLMs cannot solve problems of a certain complexity for a number of reasons. One of the...
dlewis.io
April 16, 2025 at 8:17 PM
While fixing a KV Cache generation bug today in the MLX GPT-2 implementation that I submitted last year, I discovered that the gpt2 (128M) model is much more dependent on positional encodings than the larger gpt2-xl (1.5B). Guess that explains why linear positional encoding layers were dropped.
April 14, 2025 at 2:04 AM
Qwen2.5 models are exceptionally strong at tool calling for their size. Definitely stronger than the Llama 3.1/3.2 models.
March 18, 2025 at 2:21 AM
We’re excited to announce the open sourcing of our AI Foundry Starter Template at Silex Data! This production-ready starter kit empowers you to build and deploy AI apps with LangChainAI/LangGraph, featuring streaming chat, robust Keycloak authentication, Kong's multi-model gateway, and OpenShift.
March 12, 2025 at 7:36 PM
Recently, I wanted to experiment with some algorithmic trading. Building the Interactive Brokers C++ API client library on macOS & Linux/aarch64 had a few more barriers than I anticipated. Wrote up a brief blog post with the steps. dlewis.io/ibkr-cpp-api/
Building the IBKR C++ API Client Library
Recently, I wanted to use the C++ API client library that Interactive Brokers provides and experiment with some algorithmitic trading and monitoring of my positions. I had hoped there would be some pr...
dlewis.io
February 11, 2025 at 11:16 PM
Reposted by Derek Lewis
EXCLUSIVE: Microsoft and OpenAI are investigating whether a group linked to China's DeepSeek obtained OpenAI's data.
Microsoft Probing If DeepSeek-Linked Group Improperly Obtained OpenAI Data
Microsoft Corp. and OpenAI are investigating whether data output from OpenAI’s technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek, ...
www.bloomberg.com
January 29, 2025 at 3:17 AM