Show the code
::tar_visnetwork(targets_only =T) targets
Predictive Models for BoardGameGeek Ratings
I use historical data from BoardGameGeek (BGG) to train a number of predictive models for community ratings. I first classify games based on their probability of achieving a minimum number of ratings on BGG. I then estimate each game’s complexity (average weight) in order to predicts its number of user ratings and average rating. I then use these estimates to predict the expected Geek Rating.
The following (somewhat messy) visualization displays the status of the current pipeline used to train models and predict new games.
::tar_visnetwork(targets_only =T) targets
= gcs_model_board(bucket = config$bucket, prefix = config$board)
model_board
tar_load(averageweight_vetiver)
tar_load(average_vetiver)
tar_load(usersrated_vetiver)
tar_load(hurdle_vetiver)
tar_load(hurdle_threshold)
=
averageweight_fit pin_read_model(model_board,
averageweight_vetiver)
=
average_fit pin_read_model(model_board,
average_vetiver)
=
usersrated_fit pin_read_model(model_board,
usersrated_vetiver)
=
hurdle_fit pin_read_model(model_board,
hurdle_vetiver)
= valid_years$max_year
end_valid_year
=
upcoming_games |>
active_games filter(yearpublished > end_valid_year)
How did the models perform in predicting games?
I used a training-validation approach based around the year in which games were published. I creating a training set of games published prior to 2022 and evaluated its performance in predicting games published from 2022 to 2023.
How did the model perform in predicting new games? I evaluate the model primarily on games that achieved at least 25 ratings.
plot_hurdle_yes
plot_hurdle_no
targets_tracking_details(metrics = valid_metrics,
details = details) |>
select(model, minratings, outcome, any_of(c("rmse", "mae", "mape", "rsq", "ccc"))) |>
filter(minratings == 25) |>
select(minratings, everything()) |>
::gt() |>
gt::tab_options(quarto.disable_processing = T) |>
gt::gt_theme_espn() gtExtras
minratings | model | outcome | rmse | mae | mape | rsq | ccc |
---|---|---|---|---|---|---|---|
25 | glmnet | average | 0.675 | 0.498 | 7.374 | 0.294 | 0.487 |
25 | lightgbm | averageweight | 0.437 | 0.336 | 18.019 | 0.706 | 0.827 |
25 | glmnet+glmnet | bayesaverage | 0.285 | 0.159 | 2.647 | 0.430 | 0.649 |
25 | glmnet | usersrated | 1941.387 | 446.031 | 154.763 | 0.122 | 0.335 |
What were the top predictions in the validation set?
|>
valid_predictions filter(.pred_hurdle_class == 'yes') |>
select(-starts_with(".pred_hurdle")) |>
slice_max(.pred_bayesaverage, n =150, with_ties = F) |>
predictions_dt(games = games) |>
add_colors()
I use a hurdle model to predict whether games are expected to receive enough ratings to be assigned a geek rating (25 ratings). This is a classification model which assigns a probability to a game; in order to classify games, I need to determine the appropriate threshold.
I set the threshold at 0.16 - this is the point that maximizes the (F2 measure) and minimizes false negatives. For the purpose of this model, including a game that did not receive a Geek rating is much worse than missing a game that did.
|>
valid_predictions ggplot(aes(x=.pred_hurdle_yes, fill = hurdle))+
geom_density(alpha = 0.5)+
scale_color_manual()+
theme(legend.title = element_text())+
xlab("Pr(User Ratings >= 25)")+
scale_fill_manual(values = c("coral", "navy"))+
guides(fill = guide_legend(title = 'User Ratings >=25'))+
geom_vline(xintercept = hurdle_threshold,
linetype = 'dotted')
= metric_set(yardstick::roc_auc,
prob_metrics ::pr_auc)
yardstick
=
prob_hurdle_metrics |>
valid_predictions group_by(outcome = 'hurdle') |>
prob_metrics(truth = hurdle,
.pred_hurdle_yes,event_level = 'second')
|>
valid_hurdle_metrics bind_rows(prob_hurdle_metrics) |>
::gt() |>
gt::tab_options(quarto.disable_processing = T) |>
gt::fmt_number(columns = c(".estimate"),
gtdecimals = 3) |>
::gt_theme_espn() gtExtras
outcome | .metric | .estimator | .estimate |
---|---|---|---|
hurdle | bal_accuracy | binary | 0.763 |
hurdle | kap | binary | 0.424 |
hurdle | mcc | binary | 0.476 |
hurdle | f1_meas | binary | 0.636 |
hurdle | f2_meas | binary | 0.768 |
hurdle | precision | binary | 0.494 |
hurdle | recall | binary | 0.891 |
hurdle | j_index | binary | 0.526 |
hurdle | roc_auc | binary | 0.861 |
hurdle | pr_auc | binary | 0.737 |
|>
valid_predictions conf_mat(hurdle, .pred_hurdle_class) |>
autoplot(type = 'heatmap')
Which features were influential for predicting each BGG outcome?
=
average_plot |>
average_fit extract_vetiver_features() |>
plot_model_features()+
labs(title = 'Average Rating')
=
averageweight_plot |>
averageweight_fit extract_vetiver_features() |>
plot_model_features()+
labs(title = 'Average Weight')
=
usersrated_plot |>
usersrated_fit extract_vetiver_features() |>
plot_model_features()+
labs(title = 'Users Rated')
# predict games
=
predictions |>
upcoming_games impute_averageweight(
model = averageweight_fit
|>
) predict_hurdle(
model = hurdle_fit,
threshold = hurdle_threshold
|>
) predict_bayesaverage(
average_model = average_fit,
usersrated_model = usersrated_fit
)
The following table displays predicted BGG outcomes for games that are expected to achieve at least 25 user ratings.
This table displays predicted probabilities for whether games will achieve enough ratings (25) to be assigned a Geek Rating.