Jeremy Lewi
@jeremy.lewi.us
3.1K followers 270 following 450 posts
Building foyle.io to use AI to deploy and operate software. MLOps Engineer, Kubernetes enthusiast, dog owner Formerly at Google and Primer.AI Started Kubeflow
Posts Media Videos Starter Packs
I've been using codex-CLI to build the AISRE UI I've been dreaming of for two years but lacked the capability to build. Yesterday I used it to add the ability to execute code inside web containers; it even figured out how to execute d3 code and make the results visible
Interesting impact of Claude Code:

I dabble with frontend, but not a frontend expert. I always reached for etiher some templates or something like Webflow to put together some landing page.

Now I started to... ask Claude Code to build me as per my spec.

A big reason for it:
I was playing with using webcontainers to add JS executability to my @runme.dev app. But when you enable the required headers that breaks OpenAI's chatkit which is served off a different domain.
AI is separating the production and consumption of information. If direct access to Wikipedia is dropping that suggests information consumers find AI mediated consumption more useful. This will impact the production of information; including the economic model
So for-profit AI companies have trained on the world's largest collaborative volunteer project and a precious free resource, to make money for their for-profit enterprises. They have crushed traffic to the volunteer project, starving it of donors and volunteers

www.404media.co/wikipedia-sa...
Wikipedia Says AI Is Causing a Dangerous Decline in Human Visitors
“With fewer visits to Wikipedia, fewer volunteers may grow and enrich the content, and fewer individual donors may support this work.”
www.404media.co
Evan Ratliff creating an AI clone to mess with scammers and marketers is inspired and delightful
open.spotify.com/episode/6ihZ...
870: My Other Self
open.spotify.com
System design interviews versus how work really happens
Reposted by Jeremy Lewi
We have 🤖 AI notetakers in meetings but continue to silo know-how every time we close terminals. Not just the how but also the why and what.

Sign up for beta access to visr.sh: it's like as if Granola AI and #tmux had a kid.
Visr: The Agentic Terminal Notepad Integral to Your Docs
The Agentic Terminal Notepad Integral to Your Docs
visr.sh
Do you know how the python eval sandbox works? I assume it's using running python as web assembly in the browser?
Which is why you need evals. Your AISRE prompt is not a runbook that fits some apriori definition of good. Your prompt is model parameters which are implicitly defined as whatever lets the AISRE perform well; i.e minimizes loss. So the only way to optimize the AISRE is with evals.
This ends up being 20 lines of python code. Overall our AISRE is writing 100s of lines of code per investigation and I expect this will only increase. A human would never solve a problem this way.
We're finding code interpreter to be an amazing tool for letting the AISRE do its job. As on example to deal with an alert about model latency we give the AISRE a 200K JSON file and tell it to write code to pull out the name of the models autoscaler so it can then generate the relevant queries.
We find it much more effective to give the AISRE examples of the queries and explain the relationship between the dimensions so that the AISRE can go from the information in the alert to writing the queries it needs.
We're finding that AISREs solve problems very differently from humans. Notably AIs are much better at writing code whereas humans rely on visual cues. Humans use dashboards with 10s of graphs and rely on layout to understand the association between graphs.
I think People's intuition is that AISREs will solve problems the way humans do. I think this is unhelpful because it leads you to believe you just need to write down how humans solve ops problems. Notably you start thinking you can skip investing in evals.
Reposted by Jeremy Lewi
After years of complaining about cancel culture, the current administration has taken it to a new and dangerous level by routinely threatening regulatory action against media companies unless they muzzle or fire reporters and commentators it doesn’t like.
Let’s be clear about what happened to Jimmy Kimmel
Trump’s most brazen attack on free speech yet.
www.yahoo.com
Worth a read. The blog post is very understandable and has the advantage of being readable on mobile.
openai.com/index/why-la...
Does anyone have a good/OTS solution for restricting outbound Internet access at the path level? E.g suppose you wanted to allow a container to access GitHub.com/myorg/* but not GitHub.com/otherorg/*
GitHub.com
That's assuming they actually bothered to share the observations that support their hypotheses. As opposed to "I looked at logs". So I find myself going through slack and documenting past investigations with the detail needed for training/evals.
The data in slack is really dirty. People often refer to metrics and graphs in really vague ways; they might refer to a set of dashboards that has 10s of graphs and it's not clear which they are referring to. Or they might post a screenshot that makes it difficult to extract the underlying query.
For me (building an AI SRE) data labeling means going back through slack and turning past investigations into "clean" data to train/eval the AISRE.
As AI automates a lot of the coding performed by entry level SWEs, I think it might get replaced by building evals (data labeling) as labor intensive tasks handled by new grads.
Reposted by Jeremy Lewi
If you're wanted to learn applied AI evals but not sure if its for you, @sh-reya.bsky.social and I put together something that might help.

This free email course compiles what we've learned from teaching 2k+ students. It’s 17 emails plus 2 free e-books.

Here's the link: ai.hamel.dev/eval-course
AI Evals Email Course
A free 17-part email series on the principles of application-centric LLM evals.
ai.hamel.dev
I've been eval pilled by @hamel.bsky.social . Everyone is all let's build some MCP servers and ship a minimal AISRE as quickly as possible. And I'm writing a design doc about how to build evals with @runme.dev so we can iterate rapidly on the AI