This project was joint work with my Amazon colleagues (led by Yumo Xu), and it's great to see it finally published. Hope this helps motivate more careful eval work in the near future!
#AI #agent #evaluation #RAG #NLP
This project was joint work with my Amazon colleagues (led by Yumo Xu), and it's great to see it finally published. Hope this helps motivate more careful eval work in the near future!
#AI #agent #evaluation #RAG #NLP
Want to learn more? Checkout
Our paper: arxiv.org/pdf/2506.01829
Open-source code: github.com/amazon-scien...
Want to learn more? Checkout
Our paper: arxiv.org/pdf/2506.01829
Open-source code: github.com/amazon-scien...
3/
3/
But too often employers and managers forget, that highly motivated and capable candidates also hold expectations parallel to these, ...
2/
But too often employers and managers forget, that highly motivated and capable candidates also hold expectations parallel to these, ...
2/
#MondayReflection /fin
#MondayReflection /fin
As with any technological evolution, tools themselves never fully replace the humans doing the work, but greatly enhance the ones that embrace them and adapt to working with them. 6/
As with any technological evolution, tools themselves never fully replace the humans doing the work, but greatly enhance the ones that embrace them and adapt to working with them. 6/
What this 99% number oversimplifies is the amount of time my colleague and I engage in numerous offline discussions, times where I had to stop and think about what to ask the AI to code next, 2/
What this 99% number oversimplifies is the amount of time my colleague and I engage in numerous offline discussions, times where I had to stop and think about what to ask the AI to code next, 2/