College Football ML App
College Football Prediction App
Links
- Mobile version is a bit janky, so desktop use is recommended. A more optimized mobile variant will be in the works soon.
Intro
So, over the past couple of years I’ve been on-and-off developing a college football prediction model and application. I have an ungodly obsession for the sport, so this ended up becoming a passion project without an apparent end-point. So, what’s in it so far?
Game Predictions
The app uses an XGBoost model trained on the most recent season’s worth of college football data, thanks to sportsdataverse. Features include the team’s season averaged metrics up-to a game, as well as the results from the immediate game prior to a given matchup. Teams facing off then have these measures subtracted from one another, creating a scale of ‘who does what better’ for each variable. Predictions are then made with respect to final score differentials; the margin of victory. Vegas metrics are also used as features, and as you might have guessed, are hugely influential in the model.
Chatbot
I put in a chatbot in the app … that’s really it. You can ask it questions about the teams you selected, but will not always give you the most up-to-date information pertaining to the football program. For example, asking it ‘What is USC’s average yard per rush attempt’ will give you a near up-to-date answer, but asking ‘Who is USC’s head coach will probably result in an outdated “Clay Helton” response. In order to get team specific answers, ensure you select the team in the left panel and hit “predict” first. Duct tape engineering to the rescue!
Plots & Tables
There are some plots to play around with, such as season trends, as well as a direct table of data you can query by team (download option coming soon)
So… should I use it for betting?
no.
With that being said, throughout the two years I’ve been using it, the model predicts the correct outcome (win/loss) of a given game with about 70% accuracy. However, you’re better off listening to Vegas or your lucky Uncle.
When it comes to the spread, its accuracy is 50/50, so, completely unreliable. Predicting the spread is insanely difficult, and I have yet to simulate fruitful betting strategies around the spread. However, there have been some cool papers discussing how to optimally bet the spread that are worth experimenting with. Most recently, I’ve read this one:
- Dmochowski, 2023: (shows that the median margin of victory is tied to maximizing expected profit in spread betting)
Speaking of…
I’ll be doing another write-up at some point about using the Kelly Criterion in tandem with this application for moneyline bets where I’ve found better, although very moderate results. Again, I do not recommend you use this app as a betting tool, but rather for exploratory fun. I’m not responsible if you lose the Mercedes. Watching your team hit < .600 for the 5th consecutive year will already give you all the pain you need.
How did you make this?
The data gets pulled weekly using the sportsdataverse api along with a scrape of CBS’ website. From here, a multitude of data transformations occur and a XGBoost model is fit using tidymodels and xgboost libraries via R. The subsequent application was also developed in R, namely using RShiny.
App Embed:
Resources/References:
This app would not be possible without the amazing work of everyone who are a part of the
sportsdataverseproject. See more information thecfbfastrsub-package here: https://cfbfastr.sportsdataverse.org/In creating the chatbot, guidance was borrowed from James Wade’s YouTube video found here: https://youtube.com/watch?v=d7l4EZYlZE0
