Diyi Yang
@diyiyang.bsky.social
Assistant Professor @Stanford CS @StanfordNLP @StanfordAILab
Computational Social Science & NLP
Computational Social Science & NLP
Reposted by Diyi Yang
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
March 4, 2025 at 4:32 AM
We are getting closer to have agents operating in the real physical world. However, can we trust frontier models to make embodied decisions 🎮 aligned with human norms 👩⚖️ ?
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
Reposted by Diyi Yang
We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
May 3, 2025 at 1:58 PM
We (w/ @diyiyang.bsky.social, @zhuhao.me, & Bodhisattwa Prasad Majumder) are excited to present our #NAACL25 tutorial on Social Intelligence in the Age of LLMs!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
Reposted by Diyi Yang
LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)
January 17, 2025 at 5:44 PM
LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)
Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)
Reposted by Diyi Yang
My first bluesky post will be for my first project as a postdoc at Stanford.
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena
Interactive evaluation for audio models
talkarena.org
December 10, 2024 at 1:39 AM
My first bluesky post will be for my first project as a postdoc at Stanford.
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Reposted by Diyi Yang
Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!
Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)
Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)
Talk Arena
Interactive evaluation for audio models
talkarena.org
December 10, 2024 at 12:01 AM
Want to add your model to the arena? Have an idea for a new feature for Talk Arena? We are open to collaboration in many forms!
Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)
Co-led with @ellaminzhili.bsky.social in collaboration with @michaelryan207.bsky.social Kunat Pipatanakul Potsawee Manakul @zhuhao.me and @diyiyang.bsky.social (5/5)
Reposted by Diyi Yang
With an increasing number of Large *Audio* Models 🔊, which one do users like the most?
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
December 10, 2024 at 12:01 AM
With an increasing number of Large *Audio* Models 🔊, which one do users like the most?
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
Reposted by Diyi Yang
Check out our paper, code, data to learn more!
Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/
Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in...
arxiv.org
December 6, 2024 at 6:20 PM
Check out our paper, code, data to learn more!
Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/
Paper: arxiv.org/abs/2409.00138
Website: salt-nlp.github.io/PrivacyLens/
Reposted by Diyi Yang
Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)
I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!
I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!
December 6, 2024 at 6:20 PM
Excited to present our PrivacyLens paper at #NuerIPS next week! We explore LM agent privacy risks when deployed as personal assistants. (Details in thread)
I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!
I am working on developing LM agents as collaborative research partners, learning aids, personal assistants, and more. Let's connect and chat!!
Reposted by Diyi Yang
Meet Tülu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇
November 21, 2024 at 5:15 PM
Meet Tülu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇
Reposted by Diyi Yang
Missed some – or all – of our papers at #EMNLP2024?
It's not too late to catch up using this handy list from the Stanford AI Lab blog:
ai.stanford.edu/blog/emnlp-2...
It's not too late to catch up using this handy list from the Stanford AI Lab blog:
ai.stanford.edu/blog/emnlp-2...
November 18, 2024 at 4:29 PM
Missed some – or all – of our papers at #EMNLP2024?
It's not too late to catch up using this handy list from the Stanford AI Lab blog:
ai.stanford.edu/blog/emnlp-2...
It's not too late to catch up using this handy list from the Stanford AI Lab blog:
ai.stanford.edu/blog/emnlp-2...
Reposted by Diyi Yang
When I will respond to your email
November 20, 2024 at 6:38 PM
When I will respond to your email
Reposted by Diyi Yang
🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]
November 19, 2024 at 9:32 AM
🌶️(?) take: Agents are somehow hot right because people realized that LLM output can be interpreted as a DSL which directs side effects in the world (e.g. tool calls) rather than just returning text in a chat/autocomplete sense. What are the open challenges? A 🧵... [1/11]
Reposted by Diyi Yang
I did an unscientific, uncontrolled experiment for #EMNLP2024—details in 🧵👇. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.
November 18, 2024 at 6:40 PM
I did an unscientific, uncontrolled experiment for #EMNLP2024—details in 🧵👇. I posted my conference & workshop papers to 5 socials. Clear results: Mastodon is near dead, Threads may have users but not my people, not giving up on X/Twitter yet, but Bluesky is worth investing in.
Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu 😀 #EMNLP2024
Check out our slides here: tinyurl.com/language-age...
Check out our slides here: tinyurl.com/language-age...
EMNLP 2024 Tutorial: Language Agents: Foundations, Prospects, and Risks
Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.
language-agent-tutorial.github.io
November 18, 2024 at 6:28 PM
Had a great time doing the language agent tutorial (language-agent-tutorial.github.io) with Yu Su, Shunyu Yao and Tao Yu 😀 #EMNLP2024
Check out our slides here: tinyurl.com/language-age...
Check out our slides here: tinyurl.com/language-age...
Reposted by Diyi Yang
I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop 😅
I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!
I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!
November 14, 2024 at 9:26 PM
I'll be at the Google Theory and Practice of Foundation Models Workshop today and tomorrow! FOMO for EMNLP, but excited to chat more casually at a smaller non-archival workshop 😅
I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!
I am presenting at the Lightning Talks tomorrow at 1:30 PM on our Distilled Voice Assistant model if you're around!
Reposted by Diyi Yang
Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.
Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...
Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...
CS 4644 / 7643: Deep Learning - LLM Guest Lecture - Fall 2024
Training Large Language Models CS 4644 / 7643: Deep Learning William Held School of Interactive Computing Georgia Institute of Technology
docs.google.com
November 7, 2024 at 9:45 PM
Every semester, I drop into Georgia Tech's Deep Learning course to do a speed-through LLM lecture! I keep updating things to balance "history" and recent progress.
Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...
Slides for this semester are here for folks who are teaching courses on NLP/DL/LLMs in the near future: docs.google.com/presentation...
Reposted by Diyi Yang
I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
November 15, 2024 at 7:20 PM
I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg