Predicting Upcoming Board Games

Predictive Models for BoardGameGeek Ratings

Author

Phil Henrickson

Published

11/20/24

Pipeline

I use historical data from BoardGameGeek (BGG) to train a number of predictive models for boardgames. I first classify games based on their probability of achieving a minimum number of ratings on BGG. I then estimate each game’s complexity (average weight) in order to predicts its number of user ratings and average rating. I then use these estimates to compute the expected Geek Rating.

The following (somewhat messy) visualization displays the status of the current pipeline used to train models and predict new games.

Assessment

How did the models perform in predicting games?

I used a training-validation approach based around the year in which games were published. I creating a training set of games published prior to 2021 and evaluated its performance in predicting games published from 2021 to 2022.

I used a training-validation approach based around the year in which games were published. I creating a training set of games published prior to 2021 and evaluated its performance in predicting games published from 2021 to 2022.

BGG Ratings

minratings model outcome rmse mae mape rsq ccc
25 glmnet average 0.685 0.504 7.580 0.282 0.462
25 lightgbm averageweight 0.446 0.341 18.854 0.680 0.813
25 glmnet+glmnet bayesaverage 0.291 0.170 2.828 0.439 0.653
25 glmnet usersrated 1757.459 450.490 164.621 0.135 0.367

Hurdle

I first predict whether games are expected to receive enough ratings to be assigned a geek rating (25 ratings). This is a classification model which assigns a probability to a game; in order to classify games, I need to determine the appropriate threshold

I select this threshold by examining performance across a variety of classification metrics. I select the threshold that maximizes the (F2 measure) in order to minimize false negatives, as I’m interested in using the hurdle model to filter out games that are very unlikely to receive ratings, where including a game that is worse than missing a game.

outcome .metric .estimator .estimate
hurdle bal_accuracy binary 0.754
hurdle kap binary 0.401
hurdle mcc binary 0.453
hurdle f1_meas binary 0.614
hurdle f2_meas binary 0.748
hurdle precision binary 0.472
hurdle recall binary 0.877
hurdle j_index binary 0.509
hurdle roc_auc binary 0.857
hurdle pr_auc binary 0.727

Features

Which features were influential for predicting each outcome?

Predictions

Upcoming Games

The following table displays predicted BGG outcomes for games that are expected to achieve at least 25 user ratings.

Hurdle

This table displays predicted probabilities for whether games will achieve enough ratings (25) to be assigned a Geek Rating.