ehudreiter.bsky.social
@ehudreiter.bsky.social
New blog: Lets use AI to help people manage illness

I am excited by the idea of using AI to help people manage ilness and health conditions. This isnt very sexy, but I think there is real potential to improve health outcomes and quality of life.

ehudreiter.com/2026/01/19/l...
Lets use AI to help people manage illness
I am excited by the idea of using AI to help people manage ilness and health conditions. This isnt very sexy, but I think there is real potential to improve health outcomes and quality of life.
ehudreiter.com
January 19, 2026 at 9:22 AM
Other CS academics I know have done very different things in retirement: remained active in academia as emeritus, joined a startup, charitable work, moved to remote spot in Scot Highlands, write novels, etc. We did similar things as academics (research and teaching), but very diff in retirement!
January 16, 2026 at 9:16 AM
AI hallucination is in the UK political news. Israeli fans were banned from a football match, and this ban was based on a report which included hallucinated material made up by MS Copilot

www.theguardian.com/uk-news/2026...
West Midlands police chief apologises after AI error used to justify Maccabi Tel Aviv ban
Craig Guildford says he gave incorrect evidence to MPs and mistake arose from ‘use of Microsoft Copilot’
www.theguardian.com
January 14, 2026 at 3:20 PM
Reposted
Health experts: Your synthetic text "AI" overviews are misleading, for example see this about liver function tests.
Google: Okay, we'll block "AI" overviews on that query.

The product is fundamentally flawed and cannot be "fixed" by patching query by query.

A short 🧵>>
‘Dangerous and alarming’: Google removes some of its AI summaries after users’ health put at risk
Guardian investigation finds AI Overviews provided inaccurate and false information when queried over blood tests
www.theguardian.com
January 11, 2026 at 2:27 PM
Nice chat with some of my soon-to-submit PhD students. They all know how to conduct and write up research, have lots of ideas for future work, and have developed networks of collaborators. So they are ready to "leave the nest", which is good feeling for me as supervisor
January 8, 2026 at 9:54 AM
New blog (personal): Retirement Plans: Travel and some academics

I hope to retire soon, and many people are asking about my plans. Basically I want to do lots of travel, say involved in academia, and perhaps do some writing.

ehudreiter.com/2026/01/06/r...
Retirement Plans: Travel and some academics
I hope to retire soon, and many people are asking about my plans. Basically I want to do lots of travel, say involved in academia, and perhaps do some writing.
ehudreiter.com
January 6, 2026 at 8:25 AM
One nice thing about 2025 was that the two publications I was proudest of were single-author! Also many good papers with my students, but I get a special buzz from single-author papers
January 1, 2026 at 1:46 PM
New blog: Do a sanity check on your experiments

Researchers should do a “sanity” check on experiments. That is, manually inspect some (A) test/train data, (B) model/system output, and (C) evaluation output, looking for anything that seems strange.
ehudreiter.com/2025/12/22/d...
Do a sanity check on your experiments
I strongly recommend that researchers do “sanity checks” on data, model outputs, and evaluation results, looking for anomalies. This can help detect data errors, model cheating, softwar…
ehudreiter.com
December 22, 2025 at 9:05 AM
One of main goals for 2025-26 is to get 6 PhD students to submit before I retire in summer 2026. So very happy that Nikolay Babakov has submitted and passed his viva, and Iniakpokeikiye Thompson has submitted. Getting there...
December 16, 2025 at 10:11 AM
Colleague has discovered many bugs (eg incorrect annotations) in a respected 8-year old dataset he is using. Nobody warned him, and hard for him to warn others. Maybe most people just dont care if dataset is deeply flawed, as long as they can compute numbers and beat SOTA...
December 15, 2025 at 9:02 AM
Making good LLM benchmark is hard. Avoid
data contamination, reward hacking, saturation; ensure construct validity; rigorously test and validate, etc.

Unfortunately, community places little value on above. Want to beat SOTA or competitors, dont care if BM used mean anything...
December 10, 2025 at 7:55 AM
New blog: Do LLMs cheat on benchmarks

LLMs often “cheat” on benchmarks via data contamination and reward hacking. This problem is getting worse, perhaps because of perverse incentives. Need to move beyond benchmarks and start measuring real-world impact.

ehudreiter.com/2025/12/08/d...
Do LLMs cheat on benchmarks
LLMs often “cheat” on benchmarks via data contamination and reward hacking. Unfortunately, this problem seems to be getting worse, perhaps because of perverse incentives. If we want to …
ehudreiter.com
December 8, 2025 at 6:50 AM
Interesting chat about hallucination in patient information dialogues. When we ask domain experts to check statements such as "X increases liklihood of Y", response is often "depends on context" or "we dont know, need more experiments". Does this make statement a hallucination?
November 27, 2025 at 9:42 AM
New blog: Hard to Change Poor Research Culture

Research culture is very important but also very hard to change. I suspect this is one reason why it is so difficult to get people to do more rigorous and meaningful experiments.

ehudreiter.com/2025/11/24/h...
Hard to Change Poor Research Culture
Research culture is very important but also very hard to change. I suspect this is one reason why it is so difficult to get people to do more rigorous and meaningful experiments.
ehudreiter.com
November 24, 2025 at 9:11 AM
Aberdeen CS is hiring a new lecturer for its "Joint Institute" with South China Normal University. Basically you would be based and do research in Aberdeen, but would be expected to go to China a few times a year and teach at SCNU.

Closing 28 Nov

www.abdn.ac.uk/jobs/vacanci...
Lecturer in Computing Science, Natural & Computing Sciences (NCS253A) | The University of Aberdeen
University of Aberdeen Research Jobs
www.abdn.ac.uk
November 12, 2025 at 9:14 AM
I'm disturbing reports about chatbots encouraging children to kill themselves. such as www.bbc.co.uk/news/article... . Shame that the AI Safety community in general, and the @AISecurityInst in particular, seem to have little interest in this, very disappointing...
Mothers say AI chatbots encouraged their sons to kill themselves
In her first UK interview Megan Garcia speaks to Laura Kuenssberg about the death of her teenage son.
www.bbc.co.uk
November 10, 2025 at 8:51 AM
New blog: Understanding what users want from NLG

When building an NLG system, it really helps to understand what users want; this came up several times at the recent INLG conference. I discuss some of our work in this space, and give a few suggestions.

ehudreiter.com/2025/11/06/u...
Understanding what users want from NLG
When building an NLG system, it really helps to understand what users want; this came up several times at the recent INLG conference. I discuss some of our work in this space, and give a few sugges…
ehudreiter.com
November 6, 2025 at 7:26 AM
I'm trying to understand OpenAI's healthbench. "HealthBench: Evaluating Large Language Models Towards Improved Human Health" doesnt say much about the BM(eg, very few examples). Are there other papers? I dont care how well model X performs, I want to judge if I can trust the BM
November 5, 2025 at 2:27 PM
Just back from INLG. Nice event as always, but I am concerned that it is losing its uniqueness. Maybe for 2026 Ill suggest some special tracks which are interesting to INLG community but not ARR types (eg, user requirements/eval, non-LLM techniques).
November 5, 2025 at 9:15 AM
New blog: Most common uses of AI in Healthcare

Data on usage of AI in healthcare suggests that most common uses in 2025 are probably (A) giving personalised health information to patients and (B) helping clinicians write documents.

ehudreiter.com/2025/10/21/m...
Most common uses of AI in Healthcare
I review some data on usage of AI in healthcare, and conclude that the most common uses in 2025 are probably (A) giving personalised health information to patients and (B) helping clinicians write …
ehudreiter.com
October 21, 2025 at 6:21 AM
One of my main goals for 2025-26 is to help my 6 senior PhD students submit their PhDs before I retire. Glad to say that Nicolay Babakov has now done so, with viva scheduled for Dec. Other five students seem to be on track, which is encouraging.
October 15, 2025 at 9:13 AM
Somewhat frustrated yesterday to once again read ACL paper which did all sorts of complex things (including the usual results tables showing best approach) on garbage data. With minimal ack of this in limitations. Most fundamental rule of CS is Garbage In, Garbage Out
October 9, 2025 at 8:46 AM
New blog: Good diagrams for research papers

Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.

ehudreiter.com/2025/10/08/g...
Good diagrams for research papers
Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.
ehudreiter.com
October 8, 2025 at 8:27 AM
Really interesting paper on real-world evaluation in IR. I should learn more about eval in IR, its not something Ive ever properly looked at
dl.acm.org/doi/10.1145/...
What Matters in a Measure? A Perspective from Large-Scale Search Evaluation | Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
dl.acm.org
September 30, 2025 at 8:27 AM
Several people have asked me recently if I will still be able to contribute to research projects after I retire in summer 2026. Absolutely! I will have emeritus statius, and am very hapy to remain involved in research projects at Aberdeen amd elsewhere.
September 26, 2025 at 10:21 AM