Lightnews — Scholar-powered news

Sara Fish

@sarafish.bsky.social

30 followers 58 following 8 posts

PhD student at Harvard interested in EconCS and ML / previously Caltech undergrad in math

Posts Replies Media Videos

Sara Fish

@sarafish.bsky.social

In addition to the EconEvals benchmarks, in the EconEvals “litmus tests”, we quantify tendencies of LLMs and LLM agents when faced with tradeoffs for which there is no objectively correct choice: for example efficiency vs. equality. 5/6

April 4, 2025 at 3:48 PM

Sara Fish

@sarafish.bsky.social

To forestall saturation, we can scale the difficulty of our benchmark questions by scaling parameters of the economic environment. Our HARD difficulty level is challenging: no LLM we test, including o3-mini, scores above 70%. (Low scores of o3-mini possibly driven by underexploration.) 3/6

April 4, 2025 at 3:48 PM

Sara Fish

@sarafish.bsky.social

In EconEvals benchmarks, LLM agents repeatedly take actions in an economic environment, and must learn optimal actions via trial and error (a capability SoTA LLMs struggle with!) 2/6

April 4, 2025 at 3:48 PM

Sara Fish

@sarafish.bsky.social

New paper: "EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments"

We construct economic environments to measure the capabilities and tendencies of LLMs and LLM agents in pricing, procurement, task allocation and more. 1/6

Screenshot of first page of "EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments"

April 4, 2025 at 3:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news