A similar fun example is the distribution of Elo ratings on a chess site, e.g. here's the weekly distribution on Lichess for Bullet games (less than 3 minutes):
Which makes sense because the time and effort of gambling just one more game to get the rating back over the line is higher at the longer time controls.
Or you can look at the cut off date of competitive sports players and athletes. If January 1st is the cut off for age there will be a disproportionate number of players born in December. If it's June 1st many of the players will have birthdates in May and April.
In fact none of them will be, because it's computed by analyzing the rolling last 7 days, and each accounts "ranking" is whatever their ranking was at the end of that period (i.e. "now," more or less).
Basically I'm not sure why you're saying that leads to a bias toward "slightly above 100."
Ohh... unless you meant slightly above a multiple of 100. I see now. Because people stop playing when they hit a milestone, right.
(That’s basically how the election falsification example in TFA works, too: a lot of individuals targeting a subjectively nice metric without cooperating with one another.)
https://lichess.org/stat/rating/distribution/bullet
It's easy to understand why this happens:
- Player ratings will fluctuate by small amounts as they win and lose individual games.
- People are happy to stop playing when their rating is at e.g. 1503, but if it's 1497, they'd rather play just one more game than leave it that way.
- At any given time, most accounts are not playing, so the distribution shows a bias towards values just over a 100 Elo threshold.
The other neat thing is that you can see this effect reduce as you look at longer time controls:
Blitz (less than 10 min): https://lichess.org/stat/rating/distribution/blitz
Rapid (less than 30 min): https://lichess.org/stat/rating/distribution/rapid
Which makes sense because the time and effort of gambling just one more game to get the rating back over the line is higher at the longer time controls.