Sara Rosenthal
@seirasto.bsky.social
NLP Research Scientist at IBM Research
📣📣Presenting our platform used to build MTRAG!!
RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits
Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...
RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits
Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...
RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits
Retrieval Augmented Generation (RAG) is an important aspect of conversing with Large Language Models (LLMs) when factually correct information is important. LLMs may provide answers that appear correc...
arxiv.org
August 28, 2025 at 12:47 PM
📣📣Presenting our platform used to build MTRAG!!
RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits
Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...
RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits
Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...
🚀Excited to announce our MTRAGEval task at SemEval 2026!
Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...
Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
Retrieval-augmented generation (RAG) has recently become a very popular task for Large Language Models (LLMs). Evaluating them on multi-turn RAG conversations, where the system is asked to generate a ...
arxiv.org
August 4, 2025 at 6:33 AM
🚀Excited to announce our MTRAGEval task at SemEval 2026!
Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...
Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...
Working on RAG? Come check out our InspectorRAGet DEMO presented by Siva Sankalp Patel May 2 (Friday), 11-12:30 at Demo Session 8 in Hall 3! Looking forward to attending ACL in a few months! #NAACL2025 @naaclmeeting.bsky.social
paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...
paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...
InspectorRAGet: An Introspection Platform for RAG Evaluation
Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and ...
arxiv.org
May 1, 2025 at 1:24 AM
Working on RAG? Come check out our InspectorRAGet DEMO presented by Siva Sankalp Patel May 2 (Friday), 11-12:30 at Demo Session 8 in Hall 3! Looking forward to attending ACL in a few months! #NAACL2025 @naaclmeeting.bsky.social
paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...
paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...
Excited about this collab! Come check out FeeL and help advance multilingual generation in your language! huggingface.co/spaces/feel-...
March 26, 2025 at 1:59 PM
Excited about this collab! Come check out FeeL and help advance multilingual generation in your language! huggingface.co/spaces/feel-...
🌟Want to know more about our MTRAG benchmark? Check out the IBM blog highlighting our work! research.ibm.com/blog/convers...
How well can your RAG agent carry out a conversation?
IBM’s new benchmark evaluates LLMs on interactive question-answering tasks using
research.ibm.com
February 4, 2025 at 8:41 PM
🌟Want to know more about our MTRAG benchmark? Check out the IBM blog highlighting our work! research.ibm.com/blog/convers...
🌟 New Benchmark! 🌟
Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!
Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468
Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!
Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468
GitHub - IBM/mt-rag-benchmark: Multi-Turn RAG Benchmark
Multi-Turn RAG Benchmark. Contribute to IBM/mt-rag-benchmark development by creating an account on GitHub.
github.com
January 8, 2025 at 8:08 PM
🌟 New Benchmark! 🌟
Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!
Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468
Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!
Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468
Anyone else feel like Google scholar is missing citations lately? I have a recent paper that has 8 citations on semantic scholar and only 3 on Google scholar…. and I have two papers that are cited in one paper but only one has the citation 🤔
November 27, 2024 at 1:47 AM
Anyone else feel like Google scholar is missing citations lately? I have a recent paper that has 8 citations on semantic scholar and only 3 on Google scholar…. and I have two papers that are cited in one paper but only one has the citation 🤔
Reposted by Sara Rosenthal
I did a starter pack of people in New York (City) working on ML/AI. Please distribute and feel free to self nominate!
go.bsky.app/BoEtagz
go.bsky.app/BoEtagz
November 19, 2024 at 1:38 AM
I did a starter pack of people in New York (City) working on ML/AI. Please distribute and feel free to self nominate!
go.bsky.app/BoEtagz
go.bsky.app/BoEtagz
If you work on RAG check out InspectorRAGet - an awesome RAG tool for evaluation. Available on HuggingFace! We provide the interface, you provide the experiments and metrics. Want to know more? Just reach out!
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347
GitHub - IBM/InspectorRAGet: The repository contains generative AI analytics platform application code.
The repository contains generative AI analytics platform application code. - IBM/InspectorRAGet
github.com
November 22, 2024 at 2:22 AM
If you work on RAG check out InspectorRAGet - an awesome RAG tool for evaluation. Available on HuggingFace! We provide the interface, you provide the experiments and metrics. Want to know more? Just reach out!
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347
Starter pack for IBM Research! Follow awesome IBM researchers! IBMers, let me know and I will add you! go.bsky.app/2SXcRmA
November 19, 2024 at 1:13 PM
Starter pack for IBM Research! Follow awesome IBM researchers! IBMers, let me know and I will add you! go.bsky.app/2SXcRmA
Working on RAG? Check out our ClapNQ benchmark (accepted to TACL) to test the full RAG pipeline!
github.com/primeqa/clapnq
arxiv.org/abs/2404.02103
github.com/primeqa/clapnq
arxiv.org/abs/2404.02103
GitHub - primeqa/clapnq
Contribute to primeqa/clapnq development by creating an account on GitHub.
github.com
November 19, 2024 at 2:49 AM
Working on RAG? Check out our ClapNQ benchmark (accepted to TACL) to test the full RAG pipeline!
github.com/primeqa/clapnq
arxiv.org/abs/2404.02103
github.com/primeqa/clapnq
arxiv.org/abs/2404.02103