Martin Eastwood
martineastwood.co.uk
Martin Eastwood
@martineastwood.co.uk
1.6K followers 120 following 50 posts
Somewhere in the middle of a Venn diagram of machine learning and football / soccer. http://www.pena.lt/y/blog.html
Posts Media Videos Starter Packs
Pinned
⚽ penaltyblog v1.6.0 is now available!

📊 MatchFlow updates:

- SQL-style joins for nested football data (left, right, outer, inner, anti)
- Cloud storage support: read/write directly to AWS S3, Google Cloud Storage, and Azure Blob
- Automatic type inference for join keys

pip install penaltyblog
Thanks Nils, it’s a good point and is on my todo list to dig further into how long it takes for something like FSAA to stabilise to something useful early on in a player’s career. Even with a fairly wide HDI at that stage, there’s still potentially benefits to its use
New article: "Shrinkage, Uncertainty, and Son Heung-min: Using Bayesian Methods to Identify Finishing Ability"

which discusses using a Bayesian hierarchical approach to quantifying player finishing ability, with credible intervals to express uncertainty.

pena.lt/y/2025/10/01...
Shrinkage, Uncertainty, and Son Heung-min: Using Bayesian Methods to Identify Finishing Ability
Why most finishing metrics are flawed and how a Bayesian approach gives us a truer picture of a player's finishing ability...
pena.lt
🎉 penaltyblog v1.6.1 is out!

✨ What's new:

- Python 3.14 support
- scipy 1.16+ compatibility
- Better numerical stability for Negative Binomial model
- New Colab notebook for implied probabilities example

pip install --upgrade penaltyblog
Absolutely. Also, people started sharing big (for then anyway) pre-trained networks that made it much easier to get started. I built many models in my day job back then by fine tuning BERT and ImageNet that I would have struggled to train from scratch without investing massively in compute.
We can also split Massey Ratings into attack & defence:

🔴 LFC: best attack in the league paired with a mid-table defence
⚪️ ARS: Elite at both ends. They have the #1 defence and the #3 attack
🌳 Forest: A disaster at both ends of the pitch
Here's how the Premier League table really looks according to Massey Ratings, which account for strength of schedule.

📈 Arsenal (+1.5) & Man City (+1.3) are clear strongest overall
😬 Man Utd (-0.2) rank in the bottom half
📉 West Ham & Forest (-1.2) are worst teams by far
Thanks to everyone who suggested features and reported issues. Your input shapes the package's development.

Questions or feedback welcome at pena.lt/y/contact

Install: pip install penaltyblog
GitHub: github.com/martineastwo...
📚 Interactive Colab notebooks are available in the docs - experiment with real examples without any local setup.

I'll be steadily expanding these over the coming weeks to cover all functionality in the package.

Docs: penaltyblog.readthedocs.io
penaltyblog: Football Data & Modelling Made Easy — penaltyblog documentation
penaltyblog.readthedocs.io
🔧 Improved implied odds module:

- New logarithmic overround removal method for better accuracy
- Structured results instead of raw arrays
- Better handling of edge cases

Making it easier to work with bookmaker probabilities in your analyses.
💰 Expanded betting utilities:

- Kelly Criterion for multiple outcomes
- Arbitrage opportunity detection
- Value bet identification
- Hedge bet calculations
- Odds format conversion (decimal/fractional/American)

All functions now return structured outputs for easier integration.
⚽ penaltyblog v1.6.0 is now available!

📊 MatchFlow updates:

- SQL-style joins for nested football data (left, right, outer, inner, anti)
- Cloud storage support: read/write directly to AWS S3, Google Cloud Storage, and Azure Blob
- Automatic type inference for join keys

pip install penaltyblog
🚀 New article on my blog walking through the latest updates to the penaltyblog python package for football (soccer) analytics & betting

✅ New interactive pitch plots
✅ 5-10× faster goal models
✅ New Flow query DSL

👉 pena.lt/y/2025/08/14...
Penaltyblog v1.5.0: Faster Models, Smarter Queries, and a Sharper Edge
v1.5.0 delivers interactive charts, faster models, upgraded football probability grid, and a powerful Flow query language - all designed to make your analysis sharper and quicker...
pena.lt
🔍 New: Flow Query DSL

Filter datasets with safe, Pythonic expressions:
- AST-parsed (no eval)
- Variables, regex, dates
- Access nested fields
📈 Goal models are now 5-10× faster

- Cython-powered analytical gradients for speed + stability
- Fine-tune with minimizer_options:
⚽ New: Pitch Plotting API

Build interactive football visualisations with:
- Multiple layouts & themes
- Scatter, heatmaps, arrows, comets
- Custom hover tooltips
🚀 penaltyblog v1.5.0 is here!

✅ Interactive pitch plots
✅ 5-10× faster goal models
✅ New Flow query DSL

What’s new 👇
📊 New article on my blog: How Accurate Are Soccer Odds? A Data Dive into 250 Million Betting Lines

🔍 How sharp are different bookmakers?
📈 How accurate are bookmaker's odds?
🎯 Are the odds well-calibrated?

➡️ pena.lt/y/2025/07/16...
How Accurate Are Soccer Odds? A Data Dive into 250 Million Betting Lines
A data-driven deep dive into how accurately bookmakers price global soccer markets...
pena.lt
At the heart of matchflow is the Flow class - a lazy, composable way to work with nested football data.

You can filter, group, and summarize JSON-like data without flattening or loading everything into memory.

👇 Example:
Thanks! Just to remove another variable - by simulating the result I definitely know which forecast is better as the simulated result is sampled from that particular forecast. When using the actual result you could argue that the true probabilities it's sampled from are unknown.