Chess Exploratory Data Anlysis

whitewins

This analysis is based on chess data that was collected through the chess.com API. The data contains usernames, titles, number of games, and win rates for white and black. For more information, reference this earlier blog post.

The question we wanted to answer was whether color (or anything else) has an affect on win rate, but during the analysis some new interesting points popped up.

Data Exploration

First, we’ll look at how many games each title is playing. This will give us an idea of which of the four titles would provide the most accurate data. More games leads to the data falling around the true parameters.

titlesvsgames

We can see here that FIDE Masters play the most chess, so their parameters will be the closest to their true values.

Next, we’ll compare the number of games played vs their respective win rates for both games played as white and games played as black.

whitegamesvsrate

blackgamesvsrate

Comparing games playing as white vs playing games as black, it seems that as you play more games, it the win rate goes down faster for white than for black suggesting that black may have an advantage.

The most intuitive comparison to make is that between win rate and color played. Here we see these metrics split between each of the titles.

titlerates

Very clearly for all four titles, white’s win rates are on average higher than black’s win rates. I suspect that white will generally have the advantage based on this graph, but more analysis is needed to make a conclusion.

Finally, just to throw data onto the page, I just plotted the total number of games vs the total win rate, and something interesting appeared on the graph.

gamesvsrate

As you can see, most of the data is clustered around .5, or 50%, win rate. However, I find it interesting that there are is a curiously large amount of dots between .7 and .9 win rates. This means that these players consistenly win almost every time. What concerns me is that this is the case even for players who have played over 200 games. The chess.com algorithm always matches you with someone that has a similar ELO rating to make each match balanced. If someone plays someone else around their skill level, the chances of winning should be equal (I suspect the expected value of wins would be around 50%). If a player is consistently winning 70%, 80%, or even 90% of the time, it would not be out of the question that cheating is present. Cheating can take place if a player inputs each move into a separate game against a computer analysis system that calculates the best possible move for the player. To learn more about this and look deeper into possible cheating, I want to gather the accuracy score, or how perfectly the player moved his or her pieces compared to what the computer suggests in each position.

Conclusion

It is beginning to look like playing as white (making the first move of the game) has an advantage when it comes to chess. In addition, it seems like a large amount of players are cheating according to my data. I don’t, however, want to jump to accusations. I plan to collect more data like first moves and accuracy scores to more firmly find a conclusion on white’s advantage and the possible cheating pandemic found in the chess.com community!

Stay tuned for an analysis blog post to come!

Check out my repository to view my code by clicking here!