Our methods
Every modeled stat on this site leans on expected goals, so here is how accurate the model actually is. Every headline number below is scored only on 251,523 held-out shots the model never saw during training (20% of all shots since 2010-11) - true validation, not the model grading its own homework. Of the shots we call ~10%, do ~10% really go in? Yes.
Holdout scope · shots unseen in training
Predicted vs actual
1.003
24,540 xG vs 24,466 goals
Brier score
0.083
lower is better
ROC-AUC
0.723
0.5 = coin flip
Held-out shots
251,523
2010-11 to present
| xG bucket | Shots | Predicted | Actual |
|---|---|---|---|
| 0-2% | 8,567 | 1.4% | 3.6% |
| 2-4% | 48,262 | 3.1% | 2.6% |
| 4-6% | 44,543 | 4.9% | 3.9% |
| 6-8% | 32,994 | 7.0% | 6.9% |
| 8-10% | 24,494 | 8.9% | 9.6% |
| 10-15% | 40,645 | 12.3% | 13.5% |
| 15-20% | 26,078 | 17.3% | 18.5% |
| 20-30% | 20,691 | 23.6% | 23.1% |
| 30-50% | 5,225 | 34.8% | 27.9% |
| 50-101% | 24 | 51.4% | 37.5% |
Year-by-year stability
Predicted goals (sum of xG) vs actual goals each season, still held-out shots only - the model should track reality every year, not just on average.
| Season | Shots | Predicted goals | Actual goals | Ratio |
|---|---|---|---|---|
| 2010-11 | 16,255 | 1,577 | 1,536 | 1.027 |
| 2011-12 | 15,745 | 1,532 | 1,370 | 1.118 |
| 2012-13 | 9,561 | 918 | 937 | 0.979 |
| 2013-14 | 15,905 | 1,532 | 1,500 | 1.021 |
| 2014-15 | 15,971 | 1,529 | 1,526 | 1.002 |
| 2015-16 | 15,920 | 1,523 | 1,404 | 1.084 |
| 2016-17 | 16,158 | 1,545 | 1,457 | 1.060 |
| 2017-18 | 17,742 | 1,698 | 1,660 | 1.023 |
| 2018-19 | 17,198 | 1,660 | 1,665 | 0.997 |
| 2019-20 | 15,259 | 1,478 | 1,472 | 1.004 |
| 2020-21 | 11,666 | 1,144 | 1,150 | 0.995 |
| 2021-22 | 17,945 | 1,780 | 1,817 | 0.980 |
| 2022-23 | 17,637 | 1,808 | 1,759 | 1.028 |
| 2023-24 | 16,983 | 1,667 | 1,709 | 0.975 |
| 2024-25 | 16,051 | 1,561 | 1,741 | 0.897 |
| 2025-26 | 15,527 | 1,590 | 1,763 | 0.902 |
In deployment. Across every scored shot the site actually serves (1,256,487, including the 80% the model trained on): predicted ÷ actual 1.000, Brier 0.083, ROC-AUC 0.724 - essentially identical to the holdout. A five-feature logistic model is too simple to memorize shots, which is exactly why we can trust it out of sample.
Candidate · not cut over
xG v2 temporal holdout
A richer gradient-boosted candidate is now trained against a latest-season temporal holdout. The site still serves v1 xG until the rescore and regression gate is complete.
157,558 holdout shots
Brier
0.087
v1 0.095 · -0.00845
Log loss
0.298
v1 0.328 · -0.02970
ROC-AUC
0.755
v1 0.704 · +0.0510
Predicted vs actual
0.987
v1 0.892
Headline numbers are scored on a temporal holdout - seasons withheld from training. `baseline` carries the current five-feature logistic retrained on the same split, and `deployed` covers the v2 candidate refit on every eligible shot. Downstream xG surfaces still use v1 until the full rescore/cutover gate.
Parallel rescore gate
v2 has been scored against the full shot table for drift testing. Production surfaces still read v1.
1,243,738 regular reliable shots
Total xG drift
-524.15
-0.4% vs v1
v2 total xG
121482.4
v1 122006.5
Brier drift
-0.00657
lower is better
Log loss drift
-0.02596
lower is better
Empty-net correction
+5308.35
v1 treated empty nets like ordinary shots
20252026 goalie-only drift
+179.86
non-empty-net shots only
snap correction
+3465.88
largest shot-type movement
Honest notes. The model is unbiased in aggregate (predicted ≈ actual goals) and well-calibrated across most buckets. On held-out shots it over-predicts the rarest high-danger looks (30%+, where samples are thin) and under-predicts the very lowest bucket (long-range shots score more than it expects). xG covers unblocked shots with coordinates, 2010-11 onward; it is a model, not the official record.