Dinos Papakostas
@din0s.me
Research Engineer at Zeta Alpha. Likes 🧠 neural IR, 📋 model evals, and 🏋️ lifting weights. Incredibly optimistic about the future!
𝕏: din0s_ / 🖥️: din0s.me
𝕏: din0s_ / 🖥️: din0s.me
Reposted by Dinos Papakostas
In our latest blog post, we covered how we use RAGElo, an open-source toolkit we've developed internally in Zeta Alpha, to compare multiple RAG systems head-to-head and aggregate pairwise preferences into a robust, easy-to-interpret Elo ranking.
Check it out: www.zeta-alpha.com/post/robust-...
Check it out: www.zeta-alpha.com/post/robust-...
Robust evaluations for RAG with RAGElo
Retrieval-Augmented Generation (RAG) systems have gained strong traction because of their ability to ground generated answers in knowledge sources, allowing for higher accuracy and reliability. Howeve...
www.zeta-alpha.com
June 30, 2025 at 2:20 PM
In our latest blog post, we covered how we use RAGElo, an open-source toolkit we've developed internally in Zeta Alpha, to compare multiple RAG systems head-to-head and aggregate pairwise preferences into a robust, easy-to-interpret Elo ranking.
Check it out: www.zeta-alpha.com/post/robust-...
Check it out: www.zeta-alpha.com/post/robust-...
Grok starting the month with >50% is ridiculous. Did people really think Big Tech wouldn't roll out even a single incremental update? (let alone a big push like Gemini)
March 26, 2025 at 2:46 PM
Grok starting the month with >50% is ridiculous. Did people really think Big Tech wouldn't roll out even a single incremental update? (let alone a big push like Gemini)
yup, it also caught my eye because they say they did this "in contrast with gecko", and although there's no ablation study in the paper on this im sure they had some internally
March 22, 2025 at 8:25 PM
yup, it also caught my eye because they say they did this "in contrast with gecko", and although there's no ablation study in the paper on this im sure they had some internally
I wish they elaborated more on the model souping part though, it still surprises me to this day that weight merging just works 🧙🏻♂️
March 20, 2025 at 12:09 PM
I wish they elaborated more on the model souping part though, it still surprises me to this day that weight merging just works 🧙🏻♂️
sell me on ghostty? why should one switch over iterm2/kitty
January 6, 2025 at 11:10 AM
sell me on ghostty? why should one switch over iterm2/kitty