ML for a World Cup 2026 Office Pool

I built a Python system to optimise a bracket entry for a World Cup 2026 prediction pool. The pool scores 104 match score predictions, stage advancement for all 48 teams, and five joker picks at 2× points. Below is a summary of the components and findings.

Elo Ratings

Trained on 49,000+ international matches (1872–2026) from the martj42/international_results dataset. K-factor scales by tournament importance (20–60). Home advantage of +100 rating points on non-neutral matches. Goal-difference weighting matches the World Football Elo Ratings methodology.

Poisson Match Prediction

Expected goals per team are derived from the Elo difference $\Delta$ :

$\lambda_{\text{home}} = 1.35 \cdot e^{\,0.65 \cdot \Delta / 400}$

$\lambda_{\text{away}} = 1.35 \cdot e^{\,-0.65 \cdot \Delta / 400}$

Win/draw/loss probabilities are computed from independent Poisson distributions truncated at 8 goals, with cached PMF vectors per $\lambda$ . The stack uses only the Python standard library — no numpy, scipy, or pandas. All math is hand-rolled.

Backtest

Evaluated across all 22 World Cups (1930–2022), 964 matches. Elo built from prior data only.

Metric	Value
Overall accuracy	54.8%
Winner accuracy (decisive only)	70.4%
Brier score	0.196
Log loss	0.991
Champions correct	3/22

The 55% figure matches the rate at which favourites actually win. 44% of matches have teams within 100 Elo. The model is well-calibrated — Brier score is within the published academic range (0.19–0.22, Groll, Ley, Zeileis).

A 225-combination parameter grid search (K-factor, home advantage, Gaussian vs Poisson, recency weighting) improved Brier by 0.4% — negligible. Recency weighting worsened accuracy by 3–5pp.

Monte Carlo Simulation

Seeded tournament runner for the 2026 format (12 groups of 4, top 2 + 8 best third-place advance).

Deterministic per-matchup randomness via SHA-256 of match ID and team names.
Truncated Poisson sampling (0–5 goals).
Three-stage knockout resolution: 90 mins → extra time (0.35× goal rate) → penalties (empirical shootout distribution).
Tournament modifiers: Germany +60, Croatia +55, England −20, Mexico +55. Multiplicative factors: knockout pedigree ×1.03, home continent ×1.02, generation decay ×0.96.

Bracket Optimizer

Simulated annealing over 104 match scores, 5 joker placements, and 32 knockout team predictions. Six mutation strategies:

Score shift: ±1 on one team
Score swap: exchange two match scores
Joker swap: move a joker
KO team flip: swap home/away in one knockout match
Macro-swap: swap group winner/runner-up, rebuild entire knockout bracket
Multi-flip: shift 2–5 random scores simultaneously

Cooling from $T=5.0$ , decay $0.9998$ per step, 10,000 iterations. Parallel evaluation via ProcessPoolExecutor across CPU cores: 2,500 tournament seeds per candidate. Contrarian objective: $\mathbb{E}[\text{points}] + \lambda \cdot \text{novelty}(\text{bracket}, \text{consensus})$ . Spine Lock prevents macro-swaps on groups where #1 leads #2 by $\gt 50$ Elo.

Metric	Value
Random baseline EV	~57 pts
Optimizer EV	~113 pts
2022 deterministic backtest	154 pts (38% of max 640)
Main bottleneck	53% of KO matches have 0 correct teams

Synthetic Opponent Field

Generates opponent brackets with cognitive biases to estimate win probability against the pool:

Chalk bias (0.6): convert close matches to favourite wins, avoid draws
Local bias (0.25): boost CONCACAF host teams one round
Recency bias (0.15): boost recent champions

Empirical Score Distributions

Frequency tables from all 964 World Cup matches, split by group (727) and knockout (237). Used as sampling weights instead of the Poisson model for deterministic bracket generation.

Improvement Paths

Dixon-Coles corrections: low-score adjustment + time-decayed parameter estimation.
Player-level data: aggregated FIFA ratings or market values per squad.
Bookmaker odds: blend model probabilities with market-implied odds.
Form metrics: rolling goal difference, xG differentials, streak indicators.

Code

~3,500 lines of Python, zero external dependencies, MIT-licensed.

elo.py — Elo system + Poisson prediction
wl/simulation.py — Monte Carlo tournament runner
wl/optimizer.py — Simulated annealing optimizer
wl/field.py — Synthetic opponent generation
wl/empirical.py — Empirical score distributions
backtest.py — Historical validation (22 World Cups)
score_pool.py — Pool scoring function

CLI commands: ratings, predict, simulate, backtest, optimize, entry.