antialias (optional)
banner
antiali.as
antialias (optional)
@antiali.as
He / Him
Reply / Wife Guy
My kids call me “Babe”
Santa Fe
I’m sorry you are still upset about this but it was not a good example. It’s also not a good interview question and I hope you’ve not really been using it for recruiting. Ciao
January 5, 2026 at 4:52 AM
How have you determined what “does worse” means? How do you typify this “thing”? You have not specified any success criteria or evaluation metric, only made vague assertions
January 4, 2026 at 10:05 PM
More power draw too right? With my redundancy & low cost objective that extra 5W pays for itself on a cluster of more than 3 nodes. I also can’t really make use of a faster network
January 4, 2026 at 9:26 PM
Mine are all i5 chips, I opted for lower cost and more redundancy vs more compute per node
January 4, 2026 at 9:13 PM
Ah I just saw this. The micros do have (2) m.2 slots though, and I’ve got a separate Asustor NAS
January 4, 2026 at 9:11 PM
I just built my homelab with (5) Lenovo m920q and (3) HP EliteDesk G4 at around $150 per unit from
eBay. A couple of them even have 32GB RAM. Mine are all micro, curious why you say to go SFF instead?
January 4, 2026 at 9:09 PM
Trying to one-shot this in a chat ui is giving me ”I’ve tried nothing, and I’m all out of ideas”
January 4, 2026 at 8:00 PM
And yet you expected the machine to get it right out of the gate, without iterating on even a single hypothesis, idk what to tell you but you shouldn’t be evaluating that way
January 4, 2026 at 7:59 PM
Same thought here, very interesting that someone from Google is hyping Claude in public
January 4, 2026 at 6:04 PM
What I’m telling you is that you’re under-constraining your problem specification (with no success criteria), and over-fitting your instructions (answer yes or no), and that this evaluation rubric will yield unacceptably high Type II error
January 4, 2026 at 5:37 PM
Maybe you’re not paying attention to the leading voices, but one consistent point we are all trying to make is that planning and verification is key for quality output
January 4, 2026 at 5:18 PM
You knew the answer you were looking for, and posed the question in a trick “go/no” scenario. You didn’t iteratively approach a solution. You have unspecified expectations so you should expect disappointment
January 4, 2026 at 5:16 PM
In real life, when presented with a problem, do you normally emit the real solution in one attempt? Or do you form hypotheticals and eliminate them until a valid solution remains?
January 4, 2026 at 5:14 PM
It gave you multiple testable hypotheses but you’re not satisfied because it don’t guess the one you were thinking of
January 4, 2026 at 5:02 PM
You asked the model “go or no go” and it gave you its answer to that question, it doesn’t read minds
January 4, 2026 at 4:58 PM
It’s not really designed for hands-off long horizon tasks,but it has some good steering for using tools and subagents effectively, separating brainstorming from planning and implementation
January 4, 2026 at 4:48 PM
It’s a nice foundation, but I’ve done a fair bit of customization on top of it
January 4, 2026 at 4:46 PM
Have you tried superpowers?
January 4, 2026 at 4:35 PM
Reposted by antialias (optional)
holy shit.
January 4, 2026 at 3:44 PM
Simply concatenating your prompts gives an answer that’s probably more what you expected. Even so, one-shot prompts in a chat ui were not a serious benchmark even in 2025
Z.ai Chat - Free AI powered by GLM-4.7 & GLM-4.6
Chat with Z.ai's free AI to build websites, create presentations, and write professionally. Fast, smart, and reliable, powered by GLM-4.7.
chat.z.ai
January 4, 2026 at 3:11 PM
However, the point that doll is making is that it’s up to you as the model operator to define the validation or acceptance criteria for the answer you need, and to give the model tools to execute those tests adversely against its work-in-progress
January 4, 2026 at 2:52 PM
I kinda feel like you’re hiding the ball here because you haven’t told us or the model about this confounding surface you expected. And this one-shot prompting is nothing compared to the way doll is describing its workflow. Nevertheless (I think) GLM gives a better one-shot
Z.ai Chat - Free AI powered by GLM-4.7 & GLM-4.6
Chat with Z.ai's free AI to build websites, create presentations, and write professionally. Fast, smart, and reliable, powered by GLM-4.7.
chat.z.ai
January 4, 2026 at 2:50 PM
In what way exactly? It seems like you’ve primed yourself to assume the worst
January 4, 2026 at 2:35 PM