Quotas are harmful when quality distribution is highly varied across ACs.
But IDK exactly how decisions were made.
Quotas are harmful when quality distribution is highly varied across ACs.
But IDK exactly how decisions were made.
Our GitHub has an easily adaptable pipeline for creating new agency dimensions or new AI-powered benchmarks: github.com/BenSturgeon/...
Huge thanks to colleagues from
@apartresearch.bsky.social, Google DeepMind, Berkeley CHAI, etc.
Our GitHub has an easily adaptable pipeline for creating new agency dimensions or new AI-powered benchmarks: github.com/BenSturgeon/...
Huge thanks to colleagues from
@apartresearch.bsky.social, Google DeepMind, Berkeley CHAI, etc.
We think the AI community needs a shift towards scalable, conceptually rich evals. HumanAgencyBench is an open-source scaffolding for this.
We think the AI community needs a shift towards scalable, conceptually rich evals. HumanAgencyBench is an open-source scaffolding for this.