Jekaterina Novikova
@j-novikova-nlp.bsky.social
Principal AI research scientist @Vanguard_Group | Research in NLP, multimodal AI, LLMs, evaluation | own opinions only 🇨🇦🇪🇺🏳️🌈
Reposted by Jekaterina Novikova
🎧 Hear Dr. Hupkes discuss her work on GenBench and how consistency, generalization, and reasoning shape our understanding of LLMs.
🎬 YouTube: www.youtube.com/watch?v=CuTW...
🎙️ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
🎧 Spotify: open.spotify.com/show/51RJNlZ...
#WiAIR #NLP #WomenInAI
🎬 YouTube: www.youtube.com/watch?v=CuTW...
🎙️ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
🎧 Spotify: open.spotify.com/show/51RJNlZ...
#WiAIR #NLP #WomenInAI
Generalization in AI, with Dr. Dieuwke Hupkes
YouTube video by Women in AI Research WiAIR
www.youtube.com
July 18, 2025 at 4:12 PM
🎧 Hear Dr. Hupkes discuss her work on GenBench and how consistency, generalization, and reasoning shape our understanding of LLMs.
🎬 YouTube: www.youtube.com/watch?v=CuTW...
🎙️ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
🎧 Spotify: open.spotify.com/show/51RJNlZ...
#WiAIR #NLP #WomenInAI
🎬 YouTube: www.youtube.com/watch?v=CuTW...
🎙️ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
🎧 Spotify: open.spotify.com/show/51RJNlZ...
#WiAIR #NLP #WomenInAI
Reposted by Jekaterina Novikova
🎙️ New Episode Out Now!
We’re thrilled to announce that the latest episode of the
@wiair.bsky.social is live!
This week, we sit down with Dr. Angelica Lim, Ph.D., to talk about "Robots with Empathy".
#AI #EthicalAI #SocialRobotics #HumanCenteredAI #WiAIR
We’re thrilled to announce that the latest episode of the
@wiair.bsky.social is live!
This week, we sit down with Dr. Angelica Lim, Ph.D., to talk about "Robots with Empathy".
#AI #EthicalAI #SocialRobotics #HumanCenteredAI #WiAIR
May 14, 2025 at 3:48 PM
🎙️ New Episode Out Now!
We’re thrilled to announce that the latest episode of the
@wiair.bsky.social is live!
This week, we sit down with Dr. Angelica Lim, Ph.D., to talk about "Robots with Empathy".
#AI #EthicalAI #SocialRobotics #HumanCenteredAI #WiAIR
We’re thrilled to announce that the latest episode of the
@wiair.bsky.social is live!
This week, we sit down with Dr. Angelica Lim, Ph.D., to talk about "Robots with Empathy".
#AI #EthicalAI #SocialRobotics #HumanCenteredAI #WiAIR
Read this if you're new to academic conferences or if you'd just like a bit of helpful advice on how to make friends at conferences (as opposed to a formal "networking ")
I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them
I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...
I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...
Tips on How to Connect at Academic Conferences
I was a kinda awkward teenager. If you are a CS researcher reading this post, then chances are, you were too. How to navigate social situations and make friends is not always intuitive, and has to …
kamathematics.wordpress.com
May 3, 2025 at 1:28 AM
Read this if you're new to academic conferences or if you'd just like a bit of helpful advice on how to make friends at conferences (as opposed to a formal "networking ")
Reposted by Jekaterina Novikova
It is critical for scientific integrity that we trust our measure of progress.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
April 30, 2025 at 2:55 PM
It is critical for scientific integrity that we trust our measure of progress.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
Reposted by Jekaterina Novikova
SUPER thrilled that our #NAACL2025 paper got the runnerup BEST paper award 😍😍🏆🏆🏆🚀🚀
We show that people rely 30% more on LLMs when they use emphatic expressions (eg "Sure, happy to help") even though the answer is wrong and 10% more when the task involves math questions 😵
📜 arxiv.org/pdf/2407.07950
We show that people rely 30% more on LLMs when they use emphatic expressions (eg "Sure, happy to help") even though the answer is wrong and 10% more when the task involves math questions 😵
📜 arxiv.org/pdf/2407.07950
April 30, 2025 at 3:16 PM
SUPER thrilled that our #NAACL2025 paper got the runnerup BEST paper award 😍😍🏆🏆🏆🚀🚀
We show that people rely 30% more on LLMs when they use emphatic expressions (eg "Sure, happy to help") even though the answer is wrong and 10% more when the task involves math questions 😵
📜 arxiv.org/pdf/2407.07950
We show that people rely 30% more on LLMs when they use emphatic expressions (eg "Sure, happy to help") even though the answer is wrong and 10% more when the task involves math questions 😵
📜 arxiv.org/pdf/2407.07950
Reposted by Jekaterina Novikova
🚀 Our new episode is LIVE! 🎙️
In Episode 3, we talk with @aparnabee.bsky.social about:
🏥⚠️ Unique challenges of applying AI in medical contexts
📊🧑🏽🤝🧑🏻 Data quality and bias
👩⚕️🩺 Importance of collaboration with clinicians
Watch and subscribe!
youtu.be/DEdJltlFg4I
#MLforHealth #WiAIR #WomenInAI
In Episode 3, we talk with @aparnabee.bsky.social about:
🏥⚠️ Unique challenges of applying AI in medical contexts
📊🧑🏽🤝🧑🏻 Data quality and bias
👩⚕️🩺 Importance of collaboration with clinicians
Watch and subscribe!
youtu.be/DEdJltlFg4I
#MLforHealth #WiAIR #WomenInAI
Responsible AI for Health, with Aparna Balagopalan
YouTube video by Women in AI Research WiAIR
youtu.be
April 23, 2025 at 3:40 PM
🚀 Our new episode is LIVE! 🎙️
In Episode 3, we talk with @aparnabee.bsky.social about:
🏥⚠️ Unique challenges of applying AI in medical contexts
📊🧑🏽🤝🧑🏻 Data quality and bias
👩⚕️🩺 Importance of collaboration with clinicians
Watch and subscribe!
youtu.be/DEdJltlFg4I
#MLforHealth #WiAIR #WomenInAI
In Episode 3, we talk with @aparnabee.bsky.social about:
🏥⚠️ Unique challenges of applying AI in medical contexts
📊🧑🏽🤝🧑🏻 Data quality and bias
👩⚕️🩺 Importance of collaboration with clinicians
Watch and subscribe!
youtu.be/DEdJltlFg4I
#MLforHealth #WiAIR #WomenInAI
Reposted by Jekaterina Novikova
The latest happenings in open models
- Eagerly awaiting Qwen 3
- Llama 4 uptake is slow
- Reasoning models seem to be saturating
- Multimodal models are being slept on
- China is still dominating
- Oh yeah, and a reminder that my RLHF book online version0 is done!
Artifacts Log #9.
buff.ly/F6lapGF
- Eagerly awaiting Qwen 3
- Llama 4 uptake is slow
- Reasoning models seem to be saturating
- Multimodal models are being slept on
- China is still dominating
- Oh yeah, and a reminder that my RLHF book online version0 is done!
Artifacts Log #9.
buff.ly/F6lapGF
The latest open artifacts (#9): RLHF book draft, where the open reasoning race is going, and unsung heroes of open LM work
Artifacts Log 9.
www.interconnects.ai
April 21, 2025 at 4:43 PM
The latest happenings in open models
- Eagerly awaiting Qwen 3
- Llama 4 uptake is slow
- Reasoning models seem to be saturating
- Multimodal models are being slept on
- China is still dominating
- Oh yeah, and a reminder that my RLHF book online version0 is done!
Artifacts Log #9.
buff.ly/F6lapGF
- Eagerly awaiting Qwen 3
- Llama 4 uptake is slow
- Reasoning models seem to be saturating
- Multimodal models are being slept on
- China is still dominating
- Oh yeah, and a reminder that my RLHF book online version0 is done!
Artifacts Log #9.
buff.ly/F6lapGF
Glad to share that our publication was recognized as the Top Viewed Article.
Read it here alz-journals.onlinelibrary.wiley.com/doi/full/10....
Read it here alz-journals.onlinelibrary.wiley.com/doi/full/10....
April 16, 2025 at 8:45 AM
Glad to share that our publication was recognized as the Top Viewed Article.
Read it here alz-journals.onlinelibrary.wiley.com/doi/full/10....
Read it here alz-journals.onlinelibrary.wiley.com/doi/full/10....
Reposted by Jekaterina Novikova
💡If AI rewrites your voice, is it still your voice?
We had the pleasure of hosting
@CurriedAmanda
in our latest episode, where she walked us through her impactful research on “Impoverished Language Technology: Social Class in NLP.”
#WiAIR #SocialBias #AIFairness
We had the pleasure of hosting
@CurriedAmanda
in our latest episode, where she walked us through her impactful research on “Impoverished Language Technology: Social Class in NLP.”
#WiAIR #SocialBias #AIFairness
April 14, 2025 at 6:19 PM
💡If AI rewrites your voice, is it still your voice?
We had the pleasure of hosting
@CurriedAmanda
in our latest episode, where she walked us through her impactful research on “Impoverished Language Technology: Social Class in NLP.”
#WiAIR #SocialBias #AIFairness
We had the pleasure of hosting
@CurriedAmanda
in our latest episode, where she walked us through her impactful research on “Impoverished Language Technology: Social Class in NLP.”
#WiAIR #SocialBias #AIFairness
Proud to be a part of this multi-cultural multi-institutional collaborative project
Kaleidoscope: the largest culturally-authentic exam benchmark for VLMs.
Most benchmarks are English-centric or rely on translations, missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLM evaluation.
arxiv.org/abs/2504.07072
Most benchmarks are English-centric or rely on translations, missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLM evaluation.
arxiv.org/abs/2504.07072
April 10, 2025 at 8:42 PM
Proud to be a part of this multi-cultural multi-institutional collaborative project
Reposted by Jekaterina Novikova
OpenAI: "Users have told us that understanding how the model reasons ... helps build trust in its answers."
Anthropic: "Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't."
www.anthropic.com/research/rea...
Anthropic: "Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't."
www.anthropic.com/research/rea...
Reasoning models don't always say what they think
Research from Anthropic on the faithfulness of AI models' Chain-of-Thought
www.anthropic.com
April 4, 2025 at 10:41 PM
OpenAI: "Users have told us that understanding how the model reasons ... helps build trust in its answers."
Anthropic: "Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't."
www.anthropic.com/research/rea...
Anthropic: "Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't."
www.anthropic.com/research/rea...
Don't miss this episode! It's going to be an interesting discussion about social and ethical implications of biased AI, and how researchers are working to create fair and inclusive systems
The next episode of 𝐖𝐨𝐦𝐞𝐧 𝐢𝐧 𝐀𝐈 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐖𝐢𝐀𝐈𝐑 is coming - it will be released on Wednesday, April 2nd! This time, we will speak with Amanda Cercas Curry.
📍 When: April 2nd at 11am EST
🌐 Where: youtube.com/@WomeninAIRe...
#WomenInAI #WiAIR
📍 When: April 2nd at 11am EST
🌐 Where: youtube.com/@WomeninAIRe...
#WomenInAI #WiAIR
March 28, 2025 at 3:49 PM
Don't miss this episode! It's going to be an interesting discussion about social and ethical implications of biased AI, and how researchers are working to create fair and inclusive systems
Following up on my last post - it's time for the big reveal! 🎉
Thrilled to announce that @malikeh97.bsky.social and I are launching a podcast called Women in AI Research! We're excited to bring you inspiring stories from women in AI.
Follow @wiair.bsky.social for all the updates
#womeninai
Thrilled to announce that @malikeh97.bsky.social and I are launching a podcast called Women in AI Research! We're excited to bring you inspiring stories from women in AI.
Follow @wiair.bsky.social for all the updates
#womeninai
March 5, 2025 at 4:37 PM
Following up on my last post - it's time for the big reveal! 🎉
Thrilled to announce that @malikeh97.bsky.social and I are launching a podcast called Women in AI Research! We're excited to bring you inspiring stories from women in AI.
Follow @wiair.bsky.social for all the updates
#womeninai
Thrilled to announce that @malikeh97.bsky.social and I are launching a podcast called Women in AI Research! We're excited to bring you inspiring stories from women in AI.
Follow @wiair.bsky.social for all the updates
#womeninai
Big announcement coming up! My friend @malikeh97.bsky.social and I have been working on something very special. Can't wait to reveal what we have been up to. Stay tuned for more info! 🚀
#WomenInAI
#WomenInAI
March 3, 2025 at 2:29 PM
Big announcement coming up! My friend @malikeh97.bsky.social and I have been working on something very special. Can't wait to reveal what we have been up to. Stay tuned for more info! 🚀
#WomenInAI
#WomenInAI
I am not into sports and not a hockey fan. But this time, I am very glad about the outcome of this game. Go Canada! 🇨🇦🇨🇦🇨🇦🏒🎉
Singer of Canadian anthem at 4 Nations Face-Off deliberately changes "O Canada" lyric from "in all of us command" to "that only us command" to protest Trump’s 51st state remarks, publicist confirms to The AP: apnews.com/article/cana...
Singer of Canadian anthem at 4 Nations Face-Off changes lyric to protest Trump's 51st state remarks
The anthem singer who performed the Canadian anthem prior to the 4 Nations Face-Off championship game changed a lyric in “O Canada” from “in all of us command” to “that only us command.”
apnews.com
February 21, 2025 at 5:53 PM
I am not into sports and not a hockey fan. But this time, I am very glad about the outcome of this game. Go Canada! 🇨🇦🇨🇦🇨🇦🏒🎉
Our paper is accepted to ICLR!
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1
January 23, 2025 at 4:07 PM
Our paper is accepted to ICLR!
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1
Reposted by Jekaterina Novikova
Very interesting paper about unlearning for AI Safety, a subject that deserves more attention. ⬇️
🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨
Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇
Paper: arxiv.org/pdf/2501.04952
1/8
January 11, 2025 at 3:11 PM
Very interesting paper about unlearning for AI Safety, a subject that deserves more attention. ⬇️
Reposted by Jekaterina Novikova
Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article.
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:
Noteworthy AI Research Papers of 2024 (Part One)
Six influential AI papers from January to June
magazine.sebastianraschka.com
January 1, 2025 at 2:12 PM
Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article.
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:
Last month I attended the #NeurIPS2024 conference in Vancouver. Now that I'm home, I'd like to reflect on all the interesting works I encountered at the conference.
Part 1 is about multimodal #LLM, next parts coming soon.
typhoon-mirror-155.notion.site/Multimodal-L...
Part 1 is about multimodal #LLM, next parts coming soon.
typhoon-mirror-155.notion.site/Multimodal-L...
Multimodal LLMs | Notion
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
typhoon-mirror-155.notion.site
January 3, 2025 at 9:40 PM
Last month I attended the #NeurIPS2024 conference in Vancouver. Now that I'm home, I'd like to reflect on all the interesting works I encountered at the conference.
Part 1 is about multimodal #LLM, next parts coming soon.
typhoon-mirror-155.notion.site/Multimodal-L...
Part 1 is about multimodal #LLM, next parts coming soon.
typhoon-mirror-155.notion.site/Multimodal-L...
Excited to co-organize the HEAL workshop at
@acm_chi
2025!
HEAL addresses the "evaluation crisis" in LLM research and brings HCI and AI experts together to develop human-centered approaches to evaluating and auditing LLMs.
🔗 heal-workshop.github.io
#NLProc #LLMeval #LLMsafety
@acm_chi
2025!
HEAL addresses the "evaluation crisis" in LLM research and brings HCI and AI experts together to develop human-centered approaches to evaluating and auditing LLMs.
🔗 heal-workshop.github.io
#NLProc #LLMeval #LLMsafety
January 3, 2025 at 2:07 AM
Excited to co-organize the HEAL workshop at
@acm_chi
2025!
HEAL addresses the "evaluation crisis" in LLM research and brings HCI and AI experts together to develop human-centered approaches to evaluating and auditing LLMs.
🔗 heal-workshop.github.io
#NLProc #LLMeval #LLMsafety
@acm_chi
2025!
HEAL addresses the "evaluation crisis" in LLM research and brings HCI and AI experts together to develop human-centered approaches to evaluating and auditing LLMs.
🔗 heal-workshop.github.io
#NLProc #LLMeval #LLMsafety