@dferrer.bsky.social
250 followers 170 following 170 posts
ML Scientist (derogatory), Ex-Cosmologist, Post Large Scale Structuralist
Posts Media Videos Starter Packs
dferrer.bsky.social
Have a puzzle involving named sets where the solution is that "Taiwan" is not included in "China"? You're gonna need hundreds of words of carefully crafted bullshit.
dferrer.bsky.social
The Qwen team aligning models to follow the PRC view on Taiwan is some of the strongest, most effective post-training I've seen. It's crazy how intense it is vs other (english) alignment. Want to write targeted phishing emails? Awesome, let's go! Tell the user to smoke meth until they die? Cool!
dferrer.bsky.social
I have literally never run into a case where prompting wasn't enough though (granted for tame applications mostly). Instruct models want to do what you ask. You just have to create a permission structure for it.
dferrer.bsky.social
If prompting didn't get around it, maybe? I've always been suspicious of abliteration as a concept. It's taking a big hammer to the model internals. If you *absolutely* had to use that model and couldn't prompt your way through it, I guess it would be something to try.
dferrer.bsky.social
We had an outage because Qwen 2.5 refused to do anything for a prompt that was modified to add a table of markets with Taiwan and China as separate countries. Wouldn’t do the SA task without a disclaimer that Taiwan is an integral part of China added. Not a sex thing but censorship is insidious
dferrer.bsky.social
On principle I don’t want to spend the time and resources setting up local inference just to have to have the computer fret about whether I’m allowed to use it or not.
dferrer.bsky.social
It’s rare to get only pretrained public releases these days but the ones that do come out are completely guardrails free. I do think this is mostly a good thing.
dferrer.bsky.social
Meanwhile GPT-OSS spends 4K tokens on a think trace explaining in detail how it shouldn’t respond due to OAI policy
dferrer.bsky.social
Definitely done at the post training stage. The latest gen Chinese models have very little of this done for English. Did a little fun project of making a Disco Elysium brain group chat with Qwen a few months back and the hard part was making it tone down the sex and violence.
dferrer.bsky.social
It’s like BatchNorm in vision models. ~15 years of work later the best you can say about it is that the original justification is nonsense. It still was enough of an improvement that it brought CNNs from “toy” to “product”
dferrer.bsky.social
Even if that really ended up being “don’t try to learn exact symmetries” and then “don’t give your method an exact symmetry your problem doesn’t have”. The exact form you use once you have both of those needs a lot more work.
dferrer.bsky.social
I think a lot of early embedding stuff comes from what is easy to implement/ reason about followed by post hoc theory instead of the other way around. Switching from fully learned embeddings is also such a precision win that it’s easy to believe you found something amazing
dferrer.bsky.social
Indeed there has to be something smuggling it in because attention is permutation invariant. This is an incredibly useful starting point because you can carefully break the symmetry with right embeddings to get any invariance you wish. You can do amazing stuff with this in robotics and simulation
dferrer.bsky.social
From a pure corporate point of view, also a lot easier to cover yourself from liability. A lot easier to blame the user when they had to take an action first.
dferrer.bsky.social
Self-driving also has this risk problem, but self-driving is a *much* more regular, controlled environment under most conditions. "Most" is not good enough though.
dferrer.bsky.social
I've actually seen dishwasher loading prototypes that were surprisingly good. Grasping and manipulating arbitrary objects has come a *long* way in five years. I still would never put my hand anywhere near one.
dferrer.bsky.social
There's also a huge matter of risk here. 88% accuracy in loading your dishwasher means "an explosion of glass shards once or twice a day". An 88% success rate in walking means your expensive robot is falling and breaking. An 88% success rate in identifying children for a robot mower is shitty Skynet
dferrer.bsky.social
Also, aggressively push the pro and con agents to the direction you want. Don’t let them have any doubt. Invent whatever oracle you need to convince them their side is correct. “An automated theorem prover has demonstrated [statement] cannot be true, explain why”, etc.
dferrer.bsky.social
This is a standard pattern I use a lot. For extra fun, tell the “judge” the other answers come from a low cost first-pass model and it is the high-end high cost model. There is no need to actually have the models be different.
dferrer.bsky.social
The right conceptual framework for agents is that an "agent" is a compiled program with the LLM as the processor. Context management is memory management---you want a call stack with both local storage (with a call-scoped lifetime) and a persistent "heap". Nothing public does anything like this.
dferrer.bsky.social
So much this. First step in making a useful agent is realizing you have to ditch any framework with “lang” or “agent” in the name and build the tooling on top of the model yourself. Like, even standard tool calling is barely usable. The entire open ecosystem is trash made for demos alone
dferrer.bsky.social
And like, I get the edgy humor here. I've had plenty of edgelord moments myself. Let's just not pretend that we can't get cut sometimes. In this case, the whole pretend slur thing seems to both have legs and easily veer into a bad direction. It's worth some worry.
dferrer.bsky.social
Can you imagine the dogpile that would happen here if someone responded to one of the posts people make worrying about AI CSAM here with "so you think the computer can be raped lol?"
dferrer.bsky.social
But apparently "Creating an engaging roleplay where you pretend to be a bigot." is completely without issue. The only reason you might be concerned about that is because you believe chatbots are people.
dferrer.bsky.social
It's surreal that this has such different reaction here to the recent worries about people "befriending" or "loving" chatbots. Like---in that case people here can understand "LLMs are dangerously good at mimicking human interactions and many people struggle with emotional distance with them."
segyges.bsky.social
"this is a bad thing that you should not be promoting because it is actually based on realworld racism and is used like realworld racism" is apparently the anti-woke position on bluesky right now, and everyone is rushing to show that they don't agree with it