This notebook provides some basic exploratory analysis of games on boardgamegeek (BGG) in support of my work [predicting ratings for upcoming games] and [predicting games for individual users]. In particular, I examine four different community outcomes for games: average weight rating (complexity), number of user ratings, average user rating, and geek rating (a combination of user ratings and the average).[^ For this write up, I examine games published through 2021 that have achieved at least 30 user ratings by the time of writing. I have additionally excluded some games that were cancelled or never released, or have data quality issues with their profiles on BGG.]
The data I’m examining is at the game level, where I observe the BGG community’s aggregated ratings for individual games. That is, I do not have data on the underlying ratings for games, only the average, standard deviation, or sum of the distribution
The average rating and the number of user ratings is easy enough to understand, but the distributions for average weight and the geek rating are a bit wonky.
Average weight indicates the perceived complexity of the game (1 = simple, 5 = complex). It has large spikes at 1, 2, and 3 due to users gravitating towards those ratings. Fewer users rate game complexity than those that rate how good games are, so this can often be a fairly noisy estimate of a game’s complexity.
The geek rating is designed to capture games that are both highly rated and popular, meaning it is a function of the number user ratings and the average rating. BGG uses Bayesian averaging to prevent games with relatively few votes from moving up the Geek list; every game starts with ~2k votes at 5.5 (hence the large spike at 5.5), so the game’s geek rating will only move from 5.5 as it receives enough votes.
In order to get a high geek rating, a game need to both be well rated (a high average) and have a high enough number of user ratings. We can get a sense of of this by plotting games via their user ratings and their average, using color to indicate their geek rating. Notice where games with high geek ratings reside - a high average rating with lots of user ratings. I have examined the geek rating in greater detail elsewhere, as I argue it doesn’t place a high enough weight on popularity.
Each of these BGG outcomes (average weight, average, user ratings) is related to each other in some way, which is important to keep in mind as we think about modeling these outcomes. The average weight tends to be highly correlated with the average rating, while not being correlated with the number of user ratings. The geek rating is a function of the average and user ratings, which means it is also then correlated with the average weight.
Why do we see these relationships?
It’s partially just a function of the different types of games on BGG, which we can see if we break this same visualization out by game category[^ Games can fall into multiple categories, where a game is considered both a strategy game and a wargame. For this visualization, I assign games to one category each with the priority order of wargame, strategy, family, abstract, party. Anything without a category is considered other.].
Wargames tend to have high complexity, high averages and a low numbers of ratings (presumably due to a dedicated wargame following on BGG). Party games are and family games tend are low complexity and lower average ratings, but can achieve a higher geek rating due to attracting more ratings.
How have the BGG community’s ratings changed over time? BGG has data on the year in which games were published, with some games published as far as back 3500 BC[^ Senet takes the prize for oldest game, followed by Marbles and Go].
Filtering to games published since 1950, we can examine the relationship between year published and each outcome. Games published in the 80s had a bit of higher average weight on average than games published recently. The average rating of games has increased in recent years.
More and more games are being published ever year - the visualization below shows all games published instead of those that achieve a minimum number of ratings. There has been an explosion in the number of new releases in recent years, with many games being published that do not receive many votes and the potential for a higher average score (though a lower geek rating). From this, we can expect that models predicting the average will exhibit a strong relationship with time.