aymeric-roucher.bsky.social
@aymeric-roucher.bsky.social
🎓 Training pipeline:
‣ Continued pre-training on Meta's internal docs and wikis
‣ Supervised fine-tuning on past incident investigations
‣ Training data mimicked real-world constraints (2-20 potential changes per incident)
Read it in full 👉 www.tryparity.com/blog/how-met...
How Meta Uses LLMs to Improve Incident Response (and how you can too) - Parity
How Meta Uses LLMs to Improve Incident Response (and how you can too) - Meta used LLMs to root cause incidents with 42% accuracy. Here's how they did it and how you can do it too.
www.tryparity.com
November 20, 2024 at 1:50 PM
How did they do it?
🔄 Two-step approach:
‣ Heuristics (code ownership, directory structure, runtime graphs) reduce thousands of potential changes to a manageable set
‣ Fine-tuned Llama 2 7B ranks the most likely culprits
November 20, 2024 at 1:50 PM
🤔 42%, isn't that high?
➡️ When there's an issue in prod, engineers dive into recent code changes to find the offending commit. At Meta (thousands of daily changes), this is like finding a needle in a haystack.
💡 So the LLM-based suggestion can cut incident resolution time from hours to seconds!
November 20, 2024 at 1:50 PM
November 14, 2024 at 5:50 PM