Lightnews — Scholar-powered news

@aymeric-roucher.bsky.social

810 followers 9 following 13 posts

Posts Replies Media Videos

aymeric-roucher.bsky.social

@aymeric-roucher.bsky.social

🎓 Training pipeline:
‣ Continued pre-training on Meta's internal docs and wikis
‣ Supervised fine-tuning on past incident investigations
‣ Training data mimicked real-world constraints (2-20 potential changes per incident)
Read it in full 👉 www.tryparity.com/blog/how-met...

How Meta Uses LLMs to Improve Incident Response (and how you can too) - Parity

How Meta Uses LLMs to Improve Incident Response (and how you can too) - Meta used LLMs to root cause incidents with 42% accuracy. Here's how they did it and how you can do it too.

www.tryparity.com

November 20, 2024 at 1:50 PM

aymeric-roucher.bsky.social

@aymeric-roucher.bsky.social

How did they do it?
🔄 Two-step approach:
‣ Heuristics (code ownership, directory structure, runtime graphs) reduce thousands of potential changes to a manageable set
‣ Fine-tuned Llama 2 7B ranks the most likely culprits

November 20, 2024 at 1:50 PM

aymeric-roucher.bsky.social

@aymeric-roucher.bsky.social

🤔 42%, isn't that high?
➡️ When there's an issue in prod, engineers dive into recent code changes to find the offending commit. At Meta (thousands of daily changes), this is like finding a needle in a haystack.
💡 So the LLM-based suggestion can cut incident resolution time from hours to seconds!

November 20, 2024 at 1:50 PM

aymeric-roucher.bsky.social

@aymeric-roucher.bsky.social

@huggingface.bsky.social

November 14, 2024 at 5:50 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news