Lightnews — Scholar-powered news

Michelle Lam

@mlam.bsky.social

910 followers 240 following 11 posts

Stanford CS PhD student | hci, human-centered AI, social computing, responsible AI (+ dance, design, doodling!)
michelle123lam.github.io

Posts Replies Media Videos

Michelle Lam

@mlam.bsky.social

We can extend policy maps to enable Git-style collaboration and forking, aid live deliberation, and support longitudinal policy test suites & third-party audits. Policy maps can transform a nebulous space of model possibilities to an explicit specification of model behavior.

Broader usage scenarios inclue multi-stakeholder collaboration (live mode, git for policy, policy forks, participatory maps) and model evaluation + auditing (policy test suite, policy audits)

September 29, 2025 at 3:54 PM

Michelle Lam

@mlam.bsky.social

With our system, LLM safety experts rapidly discovered policy gaps and crafted new policies around problematic model behavior (e.g., incorrectly assuming genders; repeating hurtful names in summaries; blocking physical safety threats that a user needs to be able to monitor).

An evaluation with 12 LLM safety experts found it was much easier to identify policy gaps and author policies with the system compared to in their normal work.

September 29, 2025 at 3:54 PM

Michelle Lam

@mlam.bsky.social

Given the unbounded space of LLM behaviors, developers need tools that concretize the subjective decisionmaking inherent to policy design. They should have a visual space to systematically explore, with explicit conceptual links between lofty principles and grounded examples.

September 29, 2025 at 3:54 PM

Michelle Lam

@mlam.bsky.social

Our system creates linked map layers of cases, concepts, & policies: so an AI developer can author a policy that blocks model responses involving violence, visually notice a gap of physical threats that a user ought to be aware of, and test a revised policy to address this gap.

Policy maps chart LLM policy coverage over an unbounded space of model behaviors. Here, an AI practitioner is designing a policy for how an LLM should summarize violent text. Policy map abstractions (right) allow the policy designer to interactively author and test policies that govern a model’s behavior using if-then rules over concepts. The designer can create any desired concept by providing a simple text definition to capture cases of model behavior. Our Policy Projector tool (center) renders cases, concepts, and policies as visual map layers to aid iterative policy design.

September 29, 2025 at 3:54 PM

Michelle Lam

@mlam.bsky.social

LLM safety work often reasons over high-level policies (be helpful & polite), but must tackle on-the-ground cases (unsolicited money advice when stocks are mentioned). This can feel like driving on an unfamiliar road guided by a generic driver’s manual instead of a map. We introduce: Policy Maps 🗺️

September 29, 2025 at 3:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news