Lightnews — Scholar-powered news

@anmorgan.bsky.social

4 followers 4 following 29 posts

Posts Replies Media Videos

anmorgan.bsky.social

@anmorgan.bsky.social

🚀 How can we detect LLM hallucinations—without external tools or model intrinsics?

SelfCheckGPT is a zero-resource, reference-free evaluation approach by analyzing the consistency of multiple responses from the same model. Let’s break it down. 🧵👇 (1/11)

March 27, 2025 at 4:15 PM

anmorgan.bsky.social

@anmorgan.bsky.social

Borrowing on traditional #ML ensembling methods, LLM juries consists of multiple LLM judges that independently score a given output, then aggregate their scores through a voting function like max or average pooling. (2/6)

February 24, 2025 at 6:31 PM

anmorgan.bsky.social

@anmorgan.bsky.social

Single-model evaluations can be biased and inconsistent. LLM Juries, which use multiple models for assessment, offer a more reliable alternative—reducing bias and improving robustness. 🧵 (1/6)

February 24, 2025 at 6:31 PM

anmorgan.bsky.social

@anmorgan.bsky.social

G-Eval consolidates evaluations into a single metric, effectively providing the model with a unified scorecard.

It's secret to success is combining chain-of-thought (CoT) reasoning with a form-filling paradigm and scoring function. (2/4)

January 31, 2025 at 4:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news