Predicting Upcoming Board Games

Predictive Models for BoardGameGeek Ratings

Author

Phil Henrickson

Published

March 4, 2025

1 Pipeline

I use historical data from BoardGameGeek (BGG) to train a number of predictive models for community ratings. I first classify games based on their probability of achieving a minimum number of ratings on BGG. I then estimate each game’s complexity (average weight) in order to predicts its number of user ratings and average rating. I then use these estimates to predict the expected Geek Rating.

The following (somewhat messy) visualization displays the status of the current pipeline used to train models and predict new games.

Show the code
targets::tar_visnetwork(targets_only =T)
Show the code
model_board = gcs_model_board(bucket = config$bucket, prefix = config$board)

tar_load(averageweight_vetiver)
tar_load(average_vetiver)
tar_load(usersrated_vetiver)
tar_load(hurdle_vetiver)
tar_load(hurdle_threshold)

averageweight_fit =
    pin_read_model(model_board,
                   averageweight_vetiver)

average_fit =
    pin_read_model(model_board,
                   average_vetiver)

usersrated_fit =
    pin_read_model(model_board,
                   usersrated_vetiver)

hurdle_fit =
    pin_read_model(model_board,
                   hurdle_vetiver)
Show the code
end_valid_year = valid_years$max_year 

upcoming_games = 
    active_games |>
    filter(yearpublished > end_valid_year)

1.1 Assessment

How did the models perform in predicting games?

I used a training-validation approach based around the year in which games were published. I creating a training set of games published prior to 2022 and evaluated its performance in predicting games published from 2022 to 2023.

1.1.1 BGG Ratings

How did the model perform in predicting new games? I evaluate the model primarily on games that achieved at least 25 ratings.

Show the code
plot_hurdle_yes

Show the code
plot_hurdle_no

Show the code
targets_tracking_details(metrics = valid_metrics,
                         details = details) |>
    select(model, minratings, outcome, any_of(c("rmse", "mae", "mape", "rsq", "ccc"))) |>
    filter(minratings == 25) |>
    select(minratings, everything()) |>
    gt::gt() |>
    gt::tab_options(quarto.disable_processing = T) |>
    gtExtras::gt_theme_espn()
minratings model outcome rmse mae mape rsq ccc
25 glmnet average 0.675 0.498 7.374 0.294 0.487
25 lightgbm averageweight 0.437 0.336 18.019 0.706 0.827
25 glmnet+glmnet bayesaverage 0.285 0.159 2.647 0.430 0.649
25 glmnet usersrated 1941.387 446.031 154.763 0.122 0.335

What were the top predictions in the validation set?

Show the code
valid_predictions |>
    filter(.pred_hurdle_class == 'yes') |>
    select(-starts_with(".pred_hurdle")) |>
    slice_max(.pred_bayesaverage, n =150, with_ties = F) |>
    predictions_dt(games = games) |>
    add_colors()

1.1.2 Hurdle

I use a hurdle model to predict whether games are expected to receive enough ratings to be assigned a geek rating (25 ratings). This is a classification model which assigns a probability to a game; in order to classify games, I need to determine the appropriate threshold.

I set the threshold at 0.16 - this is the point that maximizes the (F2 measure) and minimizes false negatives. For the purpose of this model, including a game that did not receive a Geek rating is much worse than missing a game that did.

Show the code
valid_predictions |> 
    ggplot(aes(x=.pred_hurdle_yes, fill = hurdle))+
    geom_density(alpha = 0.5)+
    scale_color_manual()+
    theme(legend.title = element_text())+
    xlab("Pr(User Ratings >= 25)")+
    scale_fill_manual(values = c("coral", "navy"))+
    guides(fill = guide_legend(title = 'User Ratings >=25'))+
    geom_vline(xintercept = hurdle_threshold,
               linetype = 'dotted')

Show the code
prob_metrics = metric_set(yardstick::roc_auc,
                          yardstick::pr_auc)

prob_hurdle_metrics = 
    valid_predictions |>
    group_by(outcome = 'hurdle') |>
    prob_metrics(truth = hurdle,
                 .pred_hurdle_yes,
                 event_level = 'second')

valid_hurdle_metrics |>
    bind_rows(prob_hurdle_metrics) |>
    gt::gt() |>
    gt::tab_options(quarto.disable_processing = T) |>
    gt::fmt_number(columns = c(".estimate"),
                   decimals = 3) |>
    gtExtras::gt_theme_espn()
outcome .metric .estimator .estimate
hurdle bal_accuracy binary 0.763
hurdle kap binary 0.424
hurdle mcc binary 0.476
hurdle f1_meas binary 0.636
hurdle f2_meas binary 0.768
hurdle precision binary 0.494
hurdle recall binary 0.891
hurdle j_index binary 0.526
hurdle roc_auc binary 0.861
hurdle pr_auc binary 0.737
Show the code
valid_predictions |>
    conf_mat(hurdle, .pred_hurdle_class) |>
    autoplot(type = 'heatmap')

1.2 Features

Which features were influential for predicting each BGG outcome?

Show the code
average_plot = 
    average_fit |> 
    extract_vetiver_features() |>
    plot_model_features()+
    labs(title = 'Average Rating')

averageweight_plot = 
    averageweight_fit |> 
    extract_vetiver_features() |>
    plot_model_features()+
    labs(title = 'Average Weight')

usersrated_plot = 
    usersrated_fit |> 
    extract_vetiver_features() |>
    plot_model_features()+
    labs(title = 'Users Rated')

2 Predictions

Show the code
# predict games
predictions = 
    upcoming_games |>
    impute_averageweight(
        model = averageweight_fit
    ) |>
    predict_hurdle(
        model = hurdle_fit,
        threshold = hurdle_threshold
    ) |>
    predict_bayesaverage(
        average_model = average_fit,
        usersrated_model = usersrated_fit
    )

2.1 Upcoming Games

The following table displays predicted BGG outcomes for games that are expected to achieve at least 25 user ratings.

2.2 Hurdle

This table displays predicted probabilities for whether games will achieve enough ratings (25) to be assigned a Geek Rating.