Our methods

Every modeled stat on this site leans on expected goals, so here is how accurate the model actually is. Every headline number below is scored only on 251,523 held-out shots the model never saw during training (20% of all shots since 2010-11) - true validation, not the model grading its own homework. Of the shots we call ~10%, do ~10% really go in? Yes.

Holdout scope · shots unseen in training

Predicted vs actual

1.003

24,540 xG vs 24,466 goals

Brier score

0.083

lower is better

ROC-AUC

0.723

0.5 = coin flip

Held-out shots

251,523

2010-11 to present

xG bucket	Shots	Predicted	Actual
0-2%	8,567	1.4%	3.6%
2-4%	48,262	3.1%	2.6%
4-6%	44,543	4.9%	3.9%
6-8%	32,994	7.0%	6.9%
8-10%	24,494	8.9%	9.6%
10-15%	40,645	12.3%	13.5%
15-20%	26,078	17.3%	18.5%
20-30%	20,691	23.6%	23.1%
30-50%	5,225	34.8%	27.9%
50-101%	24	51.4%	37.5%

Year-by-year stability

Predicted goals (sum of xG) vs actual goals each season, still held-out shots only - the model should track reality every year, not just on average.

Season	Shots	Predicted goals	Actual goals	Ratio
2010-11	16,255	1,577	1,536	1.027
2011-12	15,745	1,532	1,370	1.118
2012-13	9,561	918	937	0.979
2013-14	15,905	1,532	1,500	1.021
2014-15	15,971	1,529	1,526	1.002
2015-16	15,920	1,523	1,404	1.084
2016-17	16,158	1,545	1,457	1.060
2017-18	17,742	1,698	1,660	1.023
2018-19	17,198	1,660	1,665	0.997
2019-20	15,259	1,478	1,472	1.004
2020-21	11,666	1,144	1,150	0.995
2021-22	17,945	1,780	1,817	0.980
2022-23	17,637	1,808	1,759	1.028
2023-24	16,983	1,667	1,709	0.975
2024-25	16,051	1,561	1,741	0.897
2025-26	15,527	1,590	1,763	0.902

In deployment. Across every scored shot the site actually serves (1,256,487, including the 80% the model trained on): predicted ÷ actual 1.000, Brier 0.083, ROC-AUC 0.724 - essentially identical to the holdout. A five-feature logistic model is too simple to memorize shots, which is exactly why we can trust it out of sample.

Candidate · not cut over

xG v2 temporal holdout

A richer gradient-boosted candidate is now trained against a latest-season temporal holdout. The site still serves v1 xG until the rescore and regression gate is complete.

157,558 holdout shots

Brier

0.087

v1 0.095 · -0.00845

Log loss

0.298

v1 0.328 · -0.02970

ROC-AUC

0.755

v1 0.704 · +0.0510

Predicted vs actual

0.987

v1 0.892

Headline numbers are scored on a temporal holdout - seasons withheld from training. `baseline` carries the current five-feature logistic retrained on the same split, and `deployed` covers the v2 candidate refit on every eligible shot. Downstream xG surfaces still use v1 until the full rescore/cutover gate.

Parallel rescore gate

v2 has been scored against the full shot table for drift testing. Production surfaces still read v1.

1,243,738 regular reliable shots

Total xG drift

-524.15

-0.4% vs v1

v2 total xG

121482.4

v1 122006.5

Brier drift

-0.00657

lower is better

Log loss drift

-0.02596

lower is better

Empty-net correction

+5308.35

v1 treated empty nets like ordinary shots

20252026 goalie-only drift

+179.86

non-empty-net shots only

snap correction

+3465.88

largest shot-type movement

Honest notes. The model is unbiased in aggregate (predicted ≈ actual goals) and well-calibrated across most buckets. On held-out shots it over-predicts the rarest high-danger looks (30%+, where samples are thin) and under-predicts the very lowest bucket (long-range shots score more than it expects). xG covers unblocked shots with coordinates, 2010-11 onward; it is a model, not the official record.