Sara Rosenthal
@seirasto.bsky.social
NLP Research Scientist at IBM Research
Retrievers (Elser shown here) struggle with later turns and non-standalone questions:
January 8, 2025 at 8:10 PM
Retrievers (Elser shown here) struggle with later turns and non-standalone questions:
SOTA LLMs struggle with later turns and unanswerable questions:
January 8, 2025 at 8:09 PM
SOTA LLMs struggle with later turns and unanswerable questions:
Sample Conversation:
January 8, 2025 at 8:09 PM
Sample Conversation:
MTRAG is a challenging benchmark for SOTA LLMs and a great way to evaluate across multiple domains for Retrieval and Generation! MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore synthetic data and LLM-as-a-judge.
January 8, 2025 at 8:09 PM
MTRAG is a challenging benchmark for SOTA LLMs and a great way to evaluate across multiple domains for Retrieval and Generation! MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore synthetic data and LLM-as-a-judge.
Please just message me on slack
November 25, 2024 at 1:01 PM
Please just message me on slack
Please add me. Thanks!
November 24, 2024 at 2:33 PM
Please add me. Thanks!
This is great! Please add me as well!
November 19, 2024 at 2:42 AM
This is great! Please add me as well!