gabrielchua.me
These classifier are:
- fast ⚡
- accurate & give well-calibrated probabilities ⚖️ (so that we can have differentiated responses)
- zero-shot 🔎 (i.e., teams can use this out of the box)
huggingface.co/collections/...
These classifier are:
- fast ⚡
- accurate & give well-calibrated probabilities ⚖️ (so that we can have differentiated responses)
- zero-shot 🔎 (i.e., teams can use this out of the box)
huggingface.co/collections/...
The goal is to classify whether a user-prompt is irrelevant with respect to the system prompt. 🎯
The goal is to classify whether a user-prompt is irrelevant with respect to the system prompt. 🎯
LLMs are powerful, but they're prone to off-topic misuse, where users push them beyond their intended scope. Think harmful prompts, jailbreaks, and misuse. So how do we build better guardrails?
arxiv.org/abs/2411.12946
LLMs are powerful, but they're prone to off-topic misuse, where users push them beyond their intended scope. Think harmful prompts, jailbreaks, and misuse. So how do we build better guardrails?
arxiv.org/abs/2411.12946