Predicting Upcoming Board Games

Predictive Models for BoardGameGeek Ratings

Author

Phil Henrickson

Published

6/18/24

Pipeline

Show the code
targets::tar_visnetwork(targets_only =T)

Models

Show the code
averageweight_fit = 
    vetiver_pin_read(
        model_board,
        "bgg_averageweight_"
    )

average_fit = 
    vetiver_pin_read(
        model_board,
        "bgg_average_"
    )

usersrated_fit = 
    vetiver_pin_read(
        model_board,
        "bgg_usersrated_"
    )

hurdle_fit = 
    vetiver_pin_read(
        model_board,
        "bgg_hurdle_"
    )

Assessment

BGG Ratings

I assess the model’s predictions for different BGG outcomes for all games in the validation set.

Show the code
valid_predictions |>
    pivot_outcomes() |>
    left_join(
        games |>
            bggUtils:::unnest_outcomes() |>
            select(game_id, usersrated),
        by = join_by(game_id)
    ) |>
    left_join(
        valid_predictions |>
            select(game_id, hurdle, .pred_hurdle_yes)
    ) |>
    mutate(hurdle = case_when(hurdle == 'yes' ~ '>25 ratings',
                              hurdle == 'no' ~ '<25 ratings')) |>
    plot_predictions(color = hurdle,
                     alpha = 0.25)+
    theme(legend.title = element_text())+
    scale_color_manual(values = c("grey60","navy"))+
    guides(colour = guide_legend(override.aes = list(alpha=1)))

Show the code
targets_tracking_details(metrics = valid_metrics,
                         details = details) |>
    select(model, minratings, outcome, any_of(c("rmse", "mae", "mape", "rsq", "ccc"))) |>
    filter(minratings == 25) |>
    select(minratings, everything()) |>
    gt::gt() |>
    gt::tab_options(quarto.disable_processing = T) |>
    gtExtras::gt_theme_espn()
minratings model outcome rmse mae mape rsq ccc
25 glmnet average 0.683 0.502 7.557 0.287 0.471
25 lightgbm averageweight 0.457 0.347 19.210 0.665 0.804
25 glmnet+glmnet bayesaverage 0.301 0.174 2.878 0.413 0.635
25 glmnet usersrated 1863.498 464.973 165.513 0.146 0.380

Hurdle

I first predict whether games are expected to receive enough ratings to be assigned a geek rating (25 ratings). This is a classification model which assigns a probability to a game; in order to classify games, I need to determine the appropriate threshold

I select this threshold by examining performance across a variety of classification metrics. I select the threshold that maximizes the (F2 measure) in order to minimize false negatives, as I’m interested in using the hurdle model to filter out games that are very unlikely to receive ratings, where including a game that is worse than missing a game.

Show the code
hurdle_results |> 
    plot_class_results()+
    theme(panel.grid.major = element_blank())
Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_line()`).

Show the code
valid_predictions |>
    conf_mat(hurdle,
             .pred_hurdle_class) |>
    autoplot(type = 'heatmap')

Show the code
prob_metrics = metric_set(yardstick::roc_auc,
                          yardstick::pr_auc)

prob_hurdle_metrics = valid_predictions |>
    group_by(outcome = 'hurdle') |>
    prob_metrics(truth = hurdle,
                 .pred_hurdle_yes,
                 event_level = 'second')

valid_hurdle_metrics |>
    bind_rows(prob_hurdle_metrics) |>
    gt::gt() |>
    gt::tab_options(quarto.disable_processing = T) |>
    gt::fmt_number(columns = c(".estimate"),
                   decimals = 3) |>
    gtExtras::gt_theme_espn()
outcome .metric .estimator .estimate
hurdle bal_accuracy binary 0.745
hurdle kap binary 0.382
hurdle mcc binary 0.437
hurdle f1_meas binary 0.604
hurdle f2_meas binary 0.742
hurdle precision binary 0.460
hurdle recall binary 0.877
hurdle j_index binary 0.491
hurdle roc_auc binary 0.853
hurdle pr_auc binary 0.713

Features

Show the code
average_plot = 
    average_fit |> 
    extract_vetiver_features() |>
    plot_model_features()+
    labs(title = 'Average Rating')

averageweight_plot = 
    averageweight_fit |> 
    extract_vetiver_features() |>
    plot_model_features()+
    labs(title = 'Average Weight')

usersrated_plot = 
    usersrated_fit |> 
    extract_vetiver_features() |>
    plot_model_features()+
    labs(title = 'Users Rated')

Predictions

Show the code
# predict games
predictions = 
    upcoming_games |>
    impute_averageweight(
        model = averageweight_fit
    ) |>
    predict_hurdle(
        model = hurdle_fit,
        threshold = hurdle_threshold
    ) |>
    predict_bayesaverage(
        average_model = average_fit,
        usersrated_model = usersrated_fit
    )

Upcoming Games

This table displays predicted BGG outcomes for games that are expected to achieve at least 25 user ratings.

Hurdle

This table displays predicted probabilities for whether games will achieve enough ratings (25) to be assigned a Geek Rating.