
7 Limitations and Future Work
7.1 Probability Prediction Focus
Despite strong offline accuracy and calibration, our strategy failed to deliver steady profits,
exposing structural flaws. A key issue is that Classwise ECE, while central to model selection, is
non‑differentiable; tree‑based models like LightGBM and XGBoost therefore rely on post‑hoc
checks. Embedding differentiable surrogates—such as soft‑binning or Maximum Mean
Calibration Error—into neural architectures could enable true end‑to‑end calibration.
A second limitation is the absence of dynamic updating. The current pipeline evaluates on a
fixed snapshot, ignoring the continual flow of new matches that shapes real betting markets.
Incorporating rolling or walk‑forward testing, together with time‑sensitive features (e.g.,
exponential‑decay weights, momentum embeddings), would better capture evolving team form
and yield more realistic performance estimates (Constantinou & Fenton 2013; Groll et al. 2019).
Finally, the disconnect between classification accuracy and return suggests that our loss
functions and metrics remain value‑agnostic. Overconfidence, calibration drift, and the lack of
profit‑weighted objectives can all erode expected gains. Future work should couple probability
forecasts with return optimization through betting‑specific and profit‑aware objectives.
7.2 Betting Optimization Focus
The Kelly strategy is highly sensitive to errors in predicted probabilities, especially near 0 or 1.
Its logarithmic utility function amplifies the impact of overestimations, often resulting in
disproportionate bets and potential losses. This risk is well-known in finance and gambling,
where even well-calibrated models can perform poorly under full Kelly allocation (MacLean,
Thorp, & Ziemba, 2010). To mitigate this, many apply fractional Kelly betting—e.g., 50% of the
recommended stake—to reduce volatility and guard against estimation noise. Although we did
not explore this approach, future work could evaluate fractional strategies to assess the trade-off
between growth and stability.
Second, our approach treats predicted probabilities as fixed point estimates, ignoring uncertainty
from model variance or misspecification. In reality, these probabilities stem from machine
learning models and are inherently uncertain. A more robust alternative is the Bayesian Kelly
criterion, which treats predictions as random variables and adjusts bet sizing based on posterior
distributions (Browne & Whitt, 1996). This probabilistic method offers greater resilience in
noisy or data-scarce settings, reducing the risk of overbetting and potentially improving
long-term performance.
Third, our model uses single-match optimization, treating each bet independently. While suitable
for evaluating individual outcomes, this approach overlooks capital allocation across multiple
concurrent bets—a more realistic scenario for active bettors. Portfolio-based extensions of the
Kelly criterion (Bell & Cover, 1988) show that diversifying across correlated bets can improve
capital growth and reduce drawdown. Integrating portfolio-level optimization would better
reflect real-world strategies and enable more effective budget management across full game
slates.