arxiv.org/abs/2409.11598
dl.acm.org/doi/10.1145/...
dl.acm.org/doi/10.1145/...
trec.nist.gov/pubs/trec33/...
trec.nist.gov/pubs/trec33/...
Joyful collaboration with Yifan He @841io.bsky.social Jaime Arguello, and @bmitra.bsky.social !
#SIGIR #TREC #TOT
Joyful collaboration with Yifan He @841io.bsky.social Jaime Arguello, and @bmitra.bsky.social !
#SIGIR #TREC #TOT
🔗LLM-Elicitation: github.com/kimdanny/llm...
🔗Human query collection interface with visual stimuli set: github.com/kimdanny/hum...
🔗LLM-Elicitation: github.com/kimdanny/llm...
🔗Human query collection interface with visual stimuli set: github.com/kimdanny/hum...
Combining both methods allows TOT query evaluation in multiple domains. We tested simulated evaluation in Movie, Landmark, and Person domains. Moreover, we build a broader, more inclusive TOT test collection.
Combining both methods allows TOT query evaluation in multiple domains. We tested simulated evaluation in Movie, Landmark, and Person domains. Moreover, we build a broader, more inclusive TOT test collection.
We designed an interface with visual prompts to induce a TOT state in human participants. Their queries closely match authentic TOT queries and captures genuine TOT experiences in a controlled setting.
We designed an interface with visual prompts to induce a TOT state in human participants. Their queries closely match authentic TOT queries and captures genuine TOT experiences in a controlled setting.
We built a TOT user simulator to produce synthetic queries. Results show high system rank correlation and linguistic similarity compared to real queries. This scalable simulated evaluation method overcomes data scarcity by simulating new queries on demand.
We built a TOT user simulator to produce synthetic queries. Results show high system rank correlation and linguistic similarity compared to real queries. This scalable simulated evaluation method overcomes data scarcity by simulating new queries on demand.
TOT query data collection relies heavily on community question answering websites (e.g., Reddit). This causes data availability issues and domain bias (most TOT queries end up being about movies or books).
TOT query data collection relies heavily on community question answering websites (e.g., Reddit). This causes data availability issues and domain bias (most TOT queries end up being about movies or books).