Lightnews — Scholar-powered news

Mark Dredze

@mdredze.bsky.social

2.7K followers 380 following 66 posts

John C Malone Professor at Johns Hopkins Computer Science, Center for Language and Speech Processing, Malone Center for Engineering in Healthcare.
Parttime: Bloomberg LP #nlproc

Posts Replies Media Videos

Mark Dredze

@mdredze.bsky.social

I know I can improve my ARR reviews, but there really is no need for name calling. 😁

February 5, 2025 at 2:13 PM

Mark Dredze

@mdredze.bsky.social

ARR: Reviews are due today.

Me:

January 20, 2025 at 1:29 PM

Mark Dredze

@mdredze.bsky.social

I feel seen. This is why I always access my API keys from my laptop.

January 17, 2025 at 7:50 PM

Mark Dredze

@mdredze.bsky.social

Do you have any of those fortune cookies that mock academics?

Sure!

January 14, 2025 at 10:19 PM

Mark Dredze

@mdredze.bsky.social

Starting a new year and reflecting on how lucky I am to work at @hopkinsengineer.bsky.social with amazing people @jhucompsci.bsky.social @jhuclsp.bsky.social.

I was promoted to full professor in 2023, and my students presented me with this amazing poster of current and former PhD students.

January 2, 2025 at 5:40 PM

Mark Dredze

@mdredze.bsky.social

Examining the generated QA pairs, you can really see the difference. Our generations (bottom) look harder and more interesting.

Try our strategy for your synthetic generation task? Check out our paper, being presented at #ML4H2024 .
arxiv.org/abs/2412.04573

December 22, 2024 at 4:01 PM

Mark Dredze

@mdredze.bsky.social

Training a Clinical QA system on our data gives big improvements, whether we generate data from Llama or GPT-4o. These improvements are both in F1 and any overlap between the extracted and true answers.

December 22, 2024 at 4:01 PM

Mark Dredze

@mdredze.bsky.social

Paper at #ML42024!

Clinical QA can help doctors find critical information in patient records. But where do we get training data for these systems? Generating this data from an LLM is hard. 🧵

December 22, 2024 at 4:01 PM

Mark Dredze

@mdredze.bsky.social

It turns out that when you have just a little supervised data, the models trained on more data and tasks, even when out of domain, do BETTER on the new clinical domain.

December 22, 2024 at 3:59 PM

Mark Dredze

@mdredze.bsky.social

We try a new clinical task and dataset/domain. In this case, the clinical T5 benefits disappear.

December 22, 2024 at 3:59 PM

Mark Dredze

@mdredze.bsky.social

Comparing 2 clinical with 3 general models on 6 clinical datasets, we find that some clinical models improve. However, these clinical test sets come from the same domain as the clinical training data. Maybe the clinical models are better on THIS clinical data, but not in general?

December 22, 2024 at 3:59 PM

Mark Dredze

@mdredze.bsky.social

Are Clinical T5 Models Better for Clinical Text? That's the question we asked in our #ML4H2024 paper.

Turns out clinical models may not be worth it. 🧵

arxiv.org/abs/2412.05845