ML for a World Cup 2026 Office Pool

June 2026 · Sven Geboers

I built a Python system to optimise a bracket entry for a World Cup 2026 prediction pool. The pool scores 104 match score predictions, stage advancement for all 48 teams, and five joker picks at 2× points. Below is a summary of the components and findings.


Elo Ratings

Trained on 49,000+ international matches (1872–2026) from the martj42/international_results dataset. K-factor scales by tournament importance (20–60). Home advantage of +100 rating points on non-neutral matches. Goal-difference weighting matches the World Football Elo Ratings methodology.


Poisson Match Prediction

Expected goals per team are derived from the Elo difference Δ\Delta:

λhome=1.35e0.65Δ/400\lambda_{\text{home}} = 1.35 \cdot e^{\,0.65 \cdot \Delta / 400}

λaway=1.35e0.65Δ/400\lambda_{\text{away}} = 1.35 \cdot e^{\,-0.65 \cdot \Delta / 400}

Win/draw/loss probabilities are computed from independent Poisson distributions truncated at 8 goals, with cached PMF vectors per λ\lambda. The stack uses only the Python standard library — no numpy, scipy, or pandas. All math is hand-rolled.


Backtest

Evaluated across all 22 World Cups (1930–2022), 964 matches. Elo built from prior data only.

MetricValue
Overall accuracy54.8%
Winner accuracy (decisive only)70.4%
Brier score0.196
Log loss0.991
Champions correct3/22

The 55% figure matches the rate at which favourites actually win. 44% of matches have teams within 100 Elo. The model is well-calibrated — Brier score is within the published academic range (0.19–0.22, Groll, Ley, Zeileis).

A 225-combination parameter grid search (K-factor, home advantage, Gaussian vs Poisson, recency weighting) improved Brier by 0.4% — negligible. Recency weighting worsened accuracy by 3–5pp.


Monte Carlo Simulation

Seeded tournament runner for the 2026 format (12 groups of 4, top 2 + 8 best third-place advance).


Bracket Optimizer

Simulated annealing over 104 match scores, 5 joker placements, and 32 knockout team predictions. Six mutation strategies:

  1. Score shift: ±1 on one team
  2. Score swap: exchange two match scores
  3. Joker swap: move a joker
  4. KO team flip: swap home/away in one knockout match
  5. Macro-swap: swap group winner/runner-up, rebuild entire knockout bracket
  6. Multi-flip: shift 2–5 random scores simultaneously

Cooling from T=5.0T=5.0, decay 0.99980.9998 per step, 10,000 iterations. Parallel evaluation via ProcessPoolExecutor across CPU cores: 2,500 tournament seeds per candidate. Contrarian objective: E[points]+λnovelty(bracket,consensus)\mathbb{E}[\text{points}] + \lambda \cdot \text{novelty}(\text{bracket}, \text{consensus}). Spine Lock prevents macro-swaps on groups where #1 leads #2 by >50\gt 50 Elo.

MetricValue
Random baseline EV~57 pts
Optimizer EV~113 pts
2022 deterministic backtest154 pts (38% of max 640)
Main bottleneck53% of KO matches have 0 correct teams

Synthetic Opponent Field

Generates opponent brackets with cognitive biases to estimate win probability against the pool:


Empirical Score Distributions

Frequency tables from all 964 World Cup matches, split by group (727) and knockout (237). Used as sampling weights instead of the Poisson model for deterministic bracket generation.


Improvement Paths

  1. Dixon-Coles corrections: low-score adjustment + time-decayed parameter estimation.
  2. Player-level data: aggregated FIFA ratings or market values per squad.
  3. Bookmaker odds: blend model probabilities with market-implied odds.
  4. Form metrics: rolling goal difference, xG differentials, streak indicators.

Code

~3,500 lines of Python, zero external dependencies, MIT-licensed.

CLI commands: ratings, predict, simulate, backtest, optimize, entry.