Jekaterina Novikova
banner
j-novikova-nlp.bsky.social
Jekaterina Novikova
@j-novikova-nlp.bsky.social
Principal AI research scientist @Vanguard_Group | Research in NLP, multimodal AI, LLMs, evaluation | own opinions only 🇨🇦🇪🇺🏳️‍🌈
Reposted by Jekaterina Novikova
🎧 Hear Dr. Hupkes discuss her work on GenBench and how consistency, generalization, and reasoning shape our understanding of LLMs.
🎬 YouTube: www.youtube.com/watch?v=CuTW...
🎙️ Apple Podcasts: podcasts.apple.com/ca/podcast/w...
🎧 Spotify: open.spotify.com/show/51RJNlZ...
#WiAIR #NLP #WomenInAI
Generalization in AI, with Dr. Dieuwke Hupkes
YouTube video by Women in AI Research WiAIR
www.youtube.com
July 18, 2025 at 4:12 PM
Reposted by Jekaterina Novikova
🎙️ New Episode Out Now!
We’re thrilled to announce that the latest episode of the
@wiair.bsky.social is live!
This week, we sit down with Dr. Angelica Lim, Ph.D., to talk about "Robots with Empathy".
#AI #EthicalAI #SocialRobotics #HumanCenteredAI #WiAIR
May 14, 2025 at 3:48 PM
Read this if you're new to academic conferences or if you'd just like a bit of helpful advice on how to make friends at conferences (as opposed to a formal "networking ")
May 3, 2025 at 1:28 AM
Reposted by Jekaterina Novikova
It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
April 30, 2025 at 2:55 PM
Reposted by Jekaterina Novikova
SUPER thrilled that our #NAACL2025 paper got the runnerup BEST paper award 😍😍🏆🏆🏆🚀🚀
We show that people rely 30% more on LLMs when they use emphatic expressions (eg "Sure, happy to help") even though the answer is wrong and 10% more when the task involves math questions 😵

📜 arxiv.org/pdf/2407.07950
April 30, 2025 at 3:16 PM
Reposted by Jekaterina Novikova
🚀 Our new episode is LIVE! 🎙️
In Episode 3, we talk with @aparnabee.bsky.social about:

🏥⚠️ Unique challenges of applying AI in medical contexts
📊🧑🏽‍🤝‍🧑🏻 Data quality and bias
👩‍⚕️🩺 Importance of collaboration with clinicians

Watch and subscribe!
youtu.be/DEdJltlFg4I

#MLforHealth #WiAIR #WomenInAI
Responsible AI for Health, with Aparna Balagopalan
YouTube video by Women in AI Research WiAIR
youtu.be
April 23, 2025 at 3:40 PM
Reposted by Jekaterina Novikova
The latest happenings in open models
- Eagerly awaiting Qwen 3
- Llama 4 uptake is slow
- Reasoning models seem to be saturating
- Multimodal models are being slept on
- China is still dominating
- Oh yeah, and a reminder that my RLHF book online version0 is done!
Artifacts Log #9.
buff.ly/F6lapGF
The latest open artifacts (#9): RLHF book draft, where the open reasoning race is going, and unsung heroes of open LM work
Artifacts Log 9.
www.interconnects.ai
April 21, 2025 at 4:43 PM
Glad to share that our publication was recognized as the Top Viewed Article.

Read it here alz-journals.onlinelibrary.wiley.com/doi/full/10....
April 16, 2025 at 8:45 AM
Reposted by Jekaterina Novikova
💡If AI rewrites your voice, is it still your voice?
We had the pleasure of hosting
@CurriedAmanda
in our latest episode, where she walked us through her impactful research on “Impoverished Language Technology: Social Class in NLP.”

#WiAIR #SocialBias #AIFairness
April 14, 2025 at 6:19 PM
Proud to be a part of this multi-cultural multi-institutional collaborative project
Kaleidoscope: the largest culturally-authentic exam benchmark for VLMs.

Most benchmarks are English-centric or rely on translations, missing linguistic & cultural nuance. Kaleidoscope expands in-language multilingual 🌎 & multimodal 👀 VLM evaluation.

arxiv.org/abs/2504.07072
April 10, 2025 at 8:42 PM
Reposted by Jekaterina Novikova
OpenAI: "Users have told us that understanding how the model reasons ... helps build trust in its answers."

Anthropic: "Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't."

www.anthropic.com/research/rea...
Reasoning models don't always say what they think
Research from Anthropic on the faithfulness of AI models' Chain-of-Thought
www.anthropic.com
April 4, 2025 at 10:41 PM
Don't miss this episode! It's going to be an interesting discussion about social and ethical implications of biased AI, and how researchers are working to create fair and inclusive systems
The next episode of 𝐖𝐨𝐦𝐞𝐧 𝐢𝐧 𝐀𝐈 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐖𝐢𝐀𝐈𝐑 is coming - it will be released on Wednesday, April 2nd! This time, we will speak with Amanda Cercas Curry.

📍 When: April 2nd at 11am EST
🌐 Where: youtube.com/@WomeninAIRe...

#WomenInAI #WiAIR
March 28, 2025 at 3:49 PM
Following up on my last post - it's time for the big reveal! 🎉

Thrilled to announce that @malikeh97.bsky.social and I are launching a podcast called Women in AI Research! We're excited to bring you inspiring stories from women in AI.

Follow @wiair.bsky.social for all the updates

#womeninai
March 5, 2025 at 4:37 PM
Big announcement coming up! My friend @malikeh97.bsky.social and I have been working on something very special. Can't wait to reveal what we have been up to. Stay tuned for more info! 🚀
#WomenInAI
March 3, 2025 at 2:29 PM
I am not into sports and not a hockey fan. But this time, I am very glad about the outcome of this game. Go Canada! 🇨🇦🇨🇦🇨🇦🏒🎉
February 21, 2025 at 5:53 PM
Our paper is accepted to ICLR!
INCLUDE: Evaluating Multilingual LLMs with Regional Knowledge (arxiv.org/abs/2411.19799)
A benchmark of ~200k QA pairs across 44 languages, capturing real-world cultural nuances.
A collaborative effort led by @cohereforai.bsky.social, with contributors worldwide.
/1
January 23, 2025 at 4:07 PM
Reposted by Jekaterina Novikova
Very interesting paper about unlearning for AI Safety, a subject that deserves more attention. ⬇️

🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨

Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇

Paper: arxiv.org/pdf/2501.04952
1/8
January 11, 2025 at 3:11 PM
Reposted by Jekaterina Novikova
Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article.
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:
Noteworthy AI Research Papers of 2024 (Part One)
Six influential AI papers from January to June
magazine.sebastianraschka.com
January 1, 2025 at 2:12 PM
Last month I attended the #NeurIPS2024 conference in Vancouver. Now that I'm home, I'd like to reflect on all the interesting works I encountered at the conference.

Part 1 is about multimodal #LLM, next parts coming soon.

typhoon-mirror-155.notion.site/Multimodal-L...
Multimodal LLMs | Notion
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
typhoon-mirror-155.notion.site
January 3, 2025 at 9:40 PM
Excited to co-organize the HEAL workshop at
@acm_chi
2025!
HEAL addresses the "evaluation crisis" in LLM research and brings HCI and AI experts together to develop human-centered approaches to evaluating and auditing LLMs.
🔗 heal-workshop.github.io
#NLProc #LLMeval #LLMsafety
January 3, 2025 at 2:07 AM