In addition to the EconEvals benchmarks, in the EconEvals “litmus tests”, we quantify tendencies of LLMs and LLM agents when faced with tradeoffs for which there is no objectively correct choice: for example efficiency vs. equality. 5/6
April 4, 2025 at 3:48 PM
In addition to the EconEvals benchmarks, in the EconEvals “litmus tests”, we quantify tendencies of LLMs and LLM agents when faced with tradeoffs for which there is no objectively correct choice: for example efficiency vs. equality. 5/6
To forestall saturation, we can scale the difficulty of our benchmark questions by scaling parameters of the economic environment. Our HARD difficulty level is challenging: no LLM we test, including o3-mini, scores above 70%. (Low scores of o3-mini possibly driven by underexploration.) 3/6
April 4, 2025 at 3:48 PM
To forestall saturation, we can scale the difficulty of our benchmark questions by scaling parameters of the economic environment. Our HARD difficulty level is challenging: no LLM we test, including o3-mini, scores above 70%. (Low scores of o3-mini possibly driven by underexploration.) 3/6
In EconEvals benchmarks, LLM agents repeatedly take actions in an economic environment, and must learn optimal actions via trial and error (a capability SoTA LLMs struggle with!) 2/6
April 4, 2025 at 3:48 PM
In EconEvals benchmarks, LLM agents repeatedly take actions in an economic environment, and must learn optimal actions via trial and error (a capability SoTA LLMs struggle with!) 2/6
New paper: "EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments"
We construct economic environments to measure the capabilities and tendencies of LLMs and LLM agents in pricing, procurement, task allocation and more. 1/6
April 4, 2025 at 3:48 PM
New paper: "EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments"
We construct economic environments to measure the capabilities and tendencies of LLMs and LLM agents in pricing, procurement, task allocation and more. 1/6