Gabriel Chua
banner
gabrielchua.bsky.social
Gabriel Chua
@gabrielchua.bsky.social
Machine Learning at GovTech

gabrielchua.me
We’ve open-sourced our 2 classifiers & the dataset (almost 50M tokens)

These classifier are:
- fast ⚡
- accurate & give well-calibrated probabilities ⚖️ (so that we can have differentiated responses)
- zero-shot 🔎 (i.e., teams can use this out of the box)

huggingface.co/collections/...
November 27, 2024 at 12:57 AM
This approach works surprisingly well, and we apply it to the "off-topic" prompt detection.

The goal is to classify whether a user-prompt is irrelevant with respect to the system prompt. 🎯
November 27, 2024 at 12:57 AM
Here, we explore a data-free guardrail development methodology leveraging LLMs to guard LLMs.
November 27, 2024 at 12:57 AM
🚨 new applied ai paper from govtech

LLMs are powerful, but they're prone to off-topic misuse, where users push them beyond their intended scope. Think harmful prompts, jailbreaks, and misuse. So how do we build better guardrails?

arxiv.org/abs/2411.12946
November 27, 2024 at 12:57 AM