I’m sorry you are still upset about this but it was not a good example. It’s also not a good interview question and I hope you’ve not really been using it for recruiting. Ciao
January 5, 2026 at 4:52 AM
I’m sorry you are still upset about this but it was not a good example. It’s also not a good interview question and I hope you’ve not really been using it for recruiting. Ciao
How have you determined what “does worse” means? How do you typify this “thing”? You have not specified any success criteria or evaluation metric, only made vague assertions
January 4, 2026 at 10:05 PM
How have you determined what “does worse” means? How do you typify this “thing”? You have not specified any success criteria or evaluation metric, only made vague assertions
More power draw too right? With my redundancy & low cost objective that extra 5W pays for itself on a cluster of more than 3 nodes. I also can’t really make use of a faster network
January 4, 2026 at 9:26 PM
More power draw too right? With my redundancy & low cost objective that extra 5W pays for itself on a cluster of more than 3 nodes. I also can’t really make use of a faster network
I just built my homelab with (5) Lenovo m920q and (3) HP EliteDesk G4 at around $150 per unit from eBay. A couple of them even have 32GB RAM. Mine are all micro, curious why you say to go SFF instead?
January 4, 2026 at 9:09 PM
I just built my homelab with (5) Lenovo m920q and (3) HP EliteDesk G4 at around $150 per unit from eBay. A couple of them even have 32GB RAM. Mine are all micro, curious why you say to go SFF instead?
And yet you expected the machine to get it right out of the gate, without iterating on even a single hypothesis, idk what to tell you but you shouldn’t be evaluating that way
January 4, 2026 at 7:59 PM
And yet you expected the machine to get it right out of the gate, without iterating on even a single hypothesis, idk what to tell you but you shouldn’t be evaluating that way
What I’m telling you is that you’re under-constraining your problem specification (with no success criteria), and over-fitting your instructions (answer yes or no), and that this evaluation rubric will yield unacceptably high Type II error
January 4, 2026 at 5:37 PM
What I’m telling you is that you’re under-constraining your problem specification (with no success criteria), and over-fitting your instructions (answer yes or no), and that this evaluation rubric will yield unacceptably high Type II error
Maybe you’re not paying attention to the leading voices, but one consistent point we are all trying to make is that planning and verification is key for quality output
January 4, 2026 at 5:18 PM
Maybe you’re not paying attention to the leading voices, but one consistent point we are all trying to make is that planning and verification is key for quality output
You knew the answer you were looking for, and posed the question in a trick “go/no” scenario. You didn’t iteratively approach a solution. You have unspecified expectations so you should expect disappointment
January 4, 2026 at 5:16 PM
You knew the answer you were looking for, and posed the question in a trick “go/no” scenario. You didn’t iteratively approach a solution. You have unspecified expectations so you should expect disappointment
In real life, when presented with a problem, do you normally emit the real solution in one attempt? Or do you form hypotheticals and eliminate them until a valid solution remains?
January 4, 2026 at 5:14 PM
In real life, when presented with a problem, do you normally emit the real solution in one attempt? Or do you form hypotheticals and eliminate them until a valid solution remains?
It’s not really designed for hands-off long horizon tasks,but it has some good steering for using tools and subagents effectively, separating brainstorming from planning and implementation
January 4, 2026 at 4:48 PM
It’s not really designed for hands-off long horizon tasks,but it has some good steering for using tools and subagents effectively, separating brainstorming from planning and implementation
Simply concatenating your prompts gives an answer that’s probably more what you expected. Even so, one-shot prompts in a chat ui were not a serious benchmark even in 2025
Simply concatenating your prompts gives an answer that’s probably more what you expected. Even so, one-shot prompts in a chat ui were not a serious benchmark even in 2025
However, the point that doll is making is that it’s up to you as the model operator to define the validation or acceptance criteria for the answer you need, and to give the model tools to execute those tests adversely against its work-in-progress
January 4, 2026 at 2:52 PM
However, the point that doll is making is that it’s up to you as the model operator to define the validation or acceptance criteria for the answer you need, and to give the model tools to execute those tests adversely against its work-in-progress
I kinda feel like you’re hiding the ball here because you haven’t told us or the model about this confounding surface you expected. And this one-shot prompting is nothing compared to the way doll is describing its workflow. Nevertheless (I think) GLM gives a better one-shot
I kinda feel like you’re hiding the ball here because you haven’t told us or the model about this confounding surface you expected. And this one-shot prompting is nothing compared to the way doll is describing its workflow. Nevertheless (I think) GLM gives a better one-shot