Torsten Goerke
@tgoerke.bsky.social
170 followers 710 following 180 posts
🦋 pretend it's a moth @hello.cubes.blue 🥶
Posts Media Videos Starter Packs
tgoerke.bsky.social
The @graze.social streams are a great data resource for computational social science. @lorenzspreen.bsky.social
graze.social
ATProto is a team sport - we build better together. That's why, effective today, we're opening up access to the enriched archive we use to power Graze, so anyone can build great services on top of ATProto without reinventing the wheel. Read more in the @leaflet.pub post below for details!
Announcing the Graze Archives
A brief tour of the S3 requestor-pays archives of the Graze turbostream and freshly-announced megastream
graze.leaflet.pub
Reposted by Torsten Goerke
Reposted by Torsten Goerke
nooki.me
nooki @nooki.me · 21h
Implementing OAuth was quite the headache 😩 I believe I’ve got it all working now, but if you run into anything strange, please do let me know.
Implementing OAuth was quite the headache. I believe I’ve got it all working now, but if you run into anything strange, please do let me know.
Reposted by Torsten Goerke
tgoerke.bsky.social
Great to see communities come together on a personal level, AP-AT interoperability under constant threat. Also the "letter to the dean" is actually answering letters from university staff. A claim they already made about private platforms. And possible answers we can provide.
tgoerke.bsky.social
Thanks for sharing. We are working on a more regular meeting structure with @ronentk.me
Reposted by Torsten Goerke
timkellogg.me
Karpathy: nanochat

A small training+inference pipeline for creating your own LLM from scratch

$100 will get you a somewhat functional model

$1000 is more coherent & solves math

detailed walkthrough: github.com/karpathy/nan...

repo: github.com/karpathy/nan...
Andrej Karpathy & @karpathy
X.com
Excited to release new repo: nanochat! (it's among the most unhinged I've written).
Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web Ul.
It weighs ~8,000 lines of imo quite clean code to:
- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with
IPDDOI - RL the model optionally on GSM8K with
"GRPO"
- Efficient inference the model in an Engine with
KV cache, simple prefill/ decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUl.
- Write a single markdown report card, summarizing and gamifying the whole thing.
Even for as low as ~$100 in cost (~4 hours on an
8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions.
About ~12 hours surpasses GPT-2 CORE metric.
As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and
70s on ARC-Easy, 20s on GSM8K, etc.
My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.
Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
nanochat
tgoerke.bsky.social
"adversarial interoperability: that’s when you create a new product or service that plugs into the existing ones without the permission of the companies that make them." I admit I find it difficult to understand that. .

Like, USB?
tgoerke.bsky.social
👀“adversarial interoperability” (dang it, stop plugging your thing into my product!).
tgoerke.bsky.social
I must correct that. There were no shops in Feischergasse where goods could be offered collectively and the shared location could be exploited. The Fleischgasse was home to the respected butchers' guild in the 14th and 15th centuries, located in the city center close to public life and politics.
Cover photo of book by Günter Naumann. Stadtlexikon Meissen.
tgoerke.bsky.social
Want to learn from guilds in trade (Gilde) and craft (Zunft) as well.
tgoerke.bsky.social
🥤Public streams as a new data asset @erlend.sh
tgoerke.bsky.social
Thanks for sharing. I also like your work on tracking the progress of the migration off-X
tgoerke.bsky.social
@emily.space reminded us of the urgency of advancing open infrastructure in Europe. In the meeting there was a strong consensus that scholarly institutions must accelerate decentralization and reduce reliance on US-hosted infrastructure.
tgoerke.bsky.social
Our next step will be to map out the different ATProto services and applications that scholarly institutions / communities are interested in and then to work on a shared roadmap.
tgoerke.bsky.social
>> arguing the streaming company has ruined the industry and turned listeners into “passive, uninspired consumers”.

>> “We just want everyone to think a little bit harder about the ways they listen to music, ...”

www.theguardian.com/technology/2...
‘Death to Spotify’: the DIY movement to get artists and fans to quit the music app
Musicians have long criticized the streaming service’s paltry payouts, but a new wave of boycotts is emerging
www.theguardian.com
Reposted by Torsten Goerke
fil.org
The things most people hate about the internet –– like misinformation, polarization, etc. –– are actually symptoms of structural problems.

E.g. Instead of fixing misinformation, we should focus on fixing data integrity, data permanence, and incentive design.
Reposted by Torsten Goerke
bskycheck.com
@ec.europa.eu is the 16th trusted verifier on #bluesky

It just verified its first 5 users with blue check.
tgoerke.bsky.social
tedunderwood.com
The full announcement is now up at this link. Deadline still 31 Oct.
tedunderwood.com
Pre-announcement of a funding opportunity for "artificial intelligence humanities sandpits," supported by AHRC, EPSRC, and SSHRC. Expressions of interest by 31 Oct; up to four grants will be funded.
tgoerke.bsky.social
Very relatable to align AI research more with humanities www.turing.ac.uk/news/publica...
“Doing AI Differently calls for a fundamental shift in AI development – one that positions the humanities, arts and qualitative social sciences as integral, rather than supplemental, to technical innovation.“
Doing AI differently
Artificial Intelligence is rapidly becoming global infrastructure – shaping decisions in healthcare, education, industry and everyday life.
www.turing.ac.uk
Reposted by Torsten Goerke
freeourfeeds.com
Join #freeourfeeds and @eurosky.social on Wednesday, 19 November in Berlin as European leaders, policymakers, developers, and investors explore what's already being built and chart the path forward for Europe’s open social web. We are so pleased that @alexandrageese.bsky.social will be joining us!
eurosky.social
We're delighted to announce @alexandrageese.bsky.social, Member of the European Parliament for the Greens/EFA Group, as a speaker for Eurosky Live in Berlin this November.

#EuroskyLive #DigitalSovereignty #EuropeanTech

www.eurosky.social/eurosky-live
Banner announcing Alexandra Geese as a speaker for the Eurosky Live conference. Alexandra Geese is Member of the European Parliament for the Greens/EFA Group. Eurosky Live takes place in Berlin on 19 Noember. Further details available at eurosky.social/eurosky-live
Reposted by Torsten Goerke
hailey.at
reworked this labeler
- ingests posts from jetstream
- pays attention to replies to my posts
- calls out to gemma via LMStudio API
- determines if the reply is bad faith
- labels the reply as bad faith if it is
GitHub - haileyok/dontshowmethis
Contribute to haileyok/dontshowmethis development by creating an account on GitHub.
github.com