This is what happens when you let an LLM label clusters based on alt tags, 0.001% of the time 🧐
www.transparent.se/image-cluste...
The alt tag is almost always "IEMBot Image TBD" so.... 🤡
This is what happens when you let an LLM label clusters based on alt tags, 0.001% of the time 🧐
www.transparent.se/image-cluste...
The alt tag is almost always "IEMBot Image TBD" so.... 🤡
Took what I learned from autolabeling posts and applied that to alt tags. 4o-mini handled those.
Took what I learned from autolabeling posts and applied that to alt tags. 4o-mini handled those.
MobileCLIP is WAY better than it deserves to be, given how performant it is.
Never got Zero shot out of it but the embeddings are very good.
MobileCLIP is WAY better than it deserves to be, given how performant it is.
Never got Zero shot out of it but the embeddings are very good.
Jetstream -> Postgres -> Python (ML)
www.transparent.se/image-cluste...
Jetstream -> Postgres -> Python (ML)
www.transparent.se/image-cluste...
I think it'll take 3 hours, I never optimized the script to do any of this in parallel, I just go to dinner and come back and it's done.
I think it'll take 3 hours, I never optimized the script to do any of this in parallel, I just go to dinner and come back and it's done.
Good news is that it now actually works without any errors or bugs, so I can probably productionalize it and run it on a GPU next week instead.
Good news is that it now actually works without any errors or bugs, so I can probably productionalize it and run it on a GPU next week instead.
I don’t know why this is so funny
I think I get a sticker when he gets back 🤷
I don’t know why this is so funny
I think I get a sticker when he gets back 🤷
www.transparent.se/clusters.html
Not enough (40k) posts to say the clusters have stabilized yet, but you can view the centroids in x/y space, search, view random posts, etc.
It's kind of fun!
But, uhh, my embedding pipeline sucks. So it's only 40k posts 😅
www.transparent.se/clusters.html
Not enough (40k) posts to say the clusters have stabilized yet, but you can view the centroids in x/y space, search, view random posts, etc.
It's kind of fun!
But, uhh, my embedding pipeline sucks. So it's only 40k posts 😅