What is this analysis?

This notebook is for selecting any boardgame and identifying its most comparable games using data from boardgamegeek.com. I use a dimension reduction method (PCA) to learn the main points of variation in data about games from boardgamegeek. To find similar games, I compute the distance between all games using their first twenty principal components. I use the distance between games to determine their overall similarity. Games that are close to each other are said to be neighbors.

This document details a game’s 25 nearest neighbors and illustrate some of the dimensions that make the selected game and its neighbors similar.

Game Profile: Captain’s Log

First, we can look at summary information for the game as it currently stands on BGG. For the average rating (Average), geek rating (Geek), and average weight (Complexity) of a game, I estimate the game’s values using predictive models trained on historical BGG data.

Information on the game’s designers, artists, mechanics and categories. Note: designer and artist features are not used in identifying nearest neighbors.

For the full profile of the selected game, go to https://boardgamegeek.com/boardgame/331363

Most Similar Games to Captain’s Log

The table below displays the most similar games to Captain’s Log using data from boardgamegeek (BGG). The reported similarity score is the (squared) inverse of the Manhattan distance between two games.

Note: The analysis does not make use of a game’s number of ratings, average, or geek average, but instead looks only at a game’s categories, mechanics, playing time, player count, and complexity. The current approach for computing the distance weighs each component equally; I plan to explore a change to this in the future by using different measures of distance, as we would reasonably expect some components to be more important than others. I initially relied on Euclidean distance, but in tinkering with the results I have opted to go with the Manhattan distance for now.

Plotting Comparable Games to Captain’s Log

Placing the selected game and its nearest neighbors on the first two principal components.

\[\\[0.05in]\]

Placing games on the first four principal components, which I have found to loosely map to complexity, theme, economy, and cooperation.

\[\\[0.05in]\]

Comparing Games on Principal Components

The chart below shows how Captain’s Log compares to its nearest neighbors on each of the ten principal components. Similar games will have similar profiles in terms of their placement on each dimension. This can be a useful way to easily see the dimensions on which games resemble each other. I’ve also plotted games that are at the tails of each component to get a reference point.

We can also use a tile plot to view each of the neighbors across every principal component used, which can be a useful way of looking to see if there’s one dimension in particular on which the games stand out.

Examining PCA Variable Loadings

What explains a game’s score on each principal component?

To gain a better understanding of these principal components, we can look at the loadings for the variables in the dataset. These are the the contributions of each variable to the ten components used in computing the distance between games. Large loadings (either positive or negative) for a variable indicate that there is a strong relationship between that variable and the component.

For instance, on the first principal component (PC1), time per player, average weight, and playing time have the highest positive loadings, while party game and max players have negative loadings. This indicates that this component seems to map to a game’s complexity - longer, more complex games will have high positive scores on this component while simpler, shorter, party games with lots of players will have low scores.