1) Meaningful metrics: evaluation metrics must connect to AI system behaviour or impact that is of relevance in the real-world. They can be abstract or simplified -- but they need to correspond to real-world performance or outcomes in a meaningful way.
1) Meaningful metrics: evaluation metrics must connect to AI system behaviour or impact that is of relevance in the real-world. They can be abstract or simplified -- but they need to correspond to real-world performance or outcomes in a meaningful way.
n/n
n/n
github.com/peasant98/ac...
Video link: youtube.com/watch?v=t3gC....
6/n
github.com/peasant98/ac...
Video link: youtube.com/watch?v=t3gC....
6/n
Zachary Robertson, Suhana Bedi, and Hansol Lee explore using total variation mutual information to evaluate LLM-based preference learning
github.com/zrobertson46...
5/n
Zachary Robertson, Suhana Bedi, and Hansol Lee explore using total variation mutual information to evaluate LLM-based preference learning
github.com/zrobertson46...
5/n
tinyurl.com/preflearn
4/n
tinyurl.com/preflearn
4/n
Chethan Bhateja, Joseph O'Brien, Afnaan Hashmi, and Eva Prakash extend metric elicitation to consider additional factors like monetary cost and latency.
arxiv.org/abs/2501.00696
3/n
Chethan Bhateja, Joseph O'Brien, Afnaan Hashmi, and Eva Prakash extend metric elicitation to consider additional factors like monetary cost and latency.
arxiv.org/abs/2501.00696
3/n