Danny To Eun Kim
teknology.bsky.social
Danny To Eun Kim
@teknology.bsky.social
PhD student @CMU LTI
NLP | IR | Evaluation | RAG
https://kimdanny.github.io
Related paper here!

bsky.app/profile/841i...
If you're interested in OpenAI including shopping results, you might also be interested in @teknology.bsky.social's paper relating retrieval diversity/fairness and generation by downstream RAG models. This has implications for individuals selling products online.
arxiv.org/abs/2409.11598
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Modern language models frequently include retrieval components to improve their outputs, giving rise to a growing number of retrieval-augmented generation (RAG) systems. Yet, most existing work in RAG...
arxiv.org
April 29, 2025 at 9:29 PM
Reposted by Danny To Eun Kim
If you're working on a recall-oriented task or with ranking systems evaluated across varied users, content, or intents, check it out. 5/5

dl.acm.org/doi/10.1145/...
April 7, 2025 at 4:15 PM
Here's an overview of TREC 2024 TOT track runs with the test queries:
trec.nist.gov/pubs/trec33/...
trec.nist.gov
March 7, 2025 at 4:29 PM
Yes! Thats exactly the case of TOT retrieval for academics :)
March 5, 2025 at 10:08 PM
These approaches powered the TREC 2024 TOT track test queries and will continue into the 2025 track (trec-tot.github.io).
Joyful collaboration with Yifan He @841io.bsky.social Jaime Arguello, and @bmitra.bsky.social !

#SIGIR #TREC #TOT
Overview
Tip of the tongue: The phenomenon of failing to retrieve something from memory, combined with partial recall and the feeling that retrieval is imminent.
trec-tot.github.io
March 5, 2025 at 1:37 AM
📂 Our Code & Data

🔗LLM-Elicitation: github.com/kimdanny/llm...
🔗Human query collection interface with visual stimuli set: github.com/kimdanny/hum...
GitHub - kimdanny/llm-tot-query-elicitation
Contribute to kimdanny/llm-tot-query-elicitation development by creating an account on GitHub.
github.com
March 5, 2025 at 1:36 AM
⚡️Multi-Domain Coverage
Combining both methods allows TOT query evaluation in multiple domains. We tested simulated evaluation in Movie, Landmark, and Person domains. Moreover, we build a broader, more inclusive TOT test collection.
March 5, 2025 at 1:36 AM
Solution2️⃣: Human-Elicitation
We designed an interface with visual prompts to induce a TOT state in human participants. Their queries closely match authentic TOT queries and captures genuine TOT experiences in a controlled setting.
March 5, 2025 at 1:35 AM
Solution1️⃣: LLM-Elicitation
We built a TOT user simulator to produce synthetic queries. Results show high system rank correlation and linguistic similarity compared to real queries. This scalable simulated evaluation method overcomes data scarcity by simulating new queries on demand.
March 5, 2025 at 1:35 AM
🤔Why the Problem?
TOT query data collection relies heavily on community question answering websites (e.g., Reddit). This causes data availability issues and domain bias (most TOT queries end up being about movies or books).
March 5, 2025 at 1:33 AM
👅Tip-of-the-Tongue (TOT) search is a complex form of known-item search, shaped by the expression of partial recall, personal context, and uncertain memories. However, TOT research has long been hindered by the scarcity of high-quality TOT queries.
March 5, 2025 at 1:33 AM