As part of my progression in learning and practicing football analytics, I’ve been meaning to complete a Python project for making data/dataviz-based summaries of individual games, which are a widespread tool for performance and team analysis. This type of dashboard – see for example Ben Griffis‘, Footballytics‘, and John (@totalf0otball)’s own versions which are a bit more advanced – uses general match statistics such as possession rate (%) as well as event data, to highlight major trends, patterns and performance indicators for a given game.
I took the spectacular November 12 2023 Premier League game between Chelsea and Manchester City as example, using this WhoScored page which includes the Opta event data. I first had to scrape all the relevant data from the html source code, and then wrangle and prepare the data by selecting each team’s passes, shots, goals, and defensive actions.
A quick explanation of the metrics and data visualizations included in the dashboard. The pass network shows the average position of each player and the frequency of passes with teammates; the pass maps show the passes into specific zones (final third; opponent’s box) that each team attempted (completed ones are in green). PPDA, i.e. “opponent’s passes allowed per defensive action” measures how many passes does each team “allow” in opponent’s half before intervening with a tackle, interception, etc This is largely a measure of pressing intensity.
Expected threat (here, from passes) is the currently the best way to get a bigger picture of offensive contribution, as explained here.
Field tilt is useful in revealing which side is more dominant in matches, because it provides a clearer picture of where that possession is, i.e. territorial dominance. It’s a hint at whether a team is dominant in the areas that matter (for scoring goals) – basically, it measures the share of possession a team has in the attacking third – rather than emphasizing possession for possession’s sake (as in the standard possession% stat).
Final Report
Here is the report for another game, using the same code:
Recommended links/resources:
- Footballytics: Data Analytics Practice: Interpreting Event Data
- Mark Carey and Tom Worville: The Athletic’s football analytics glossary: explaining xG, PPDA, field tilt and how to use them
Update (January 2024)
Whoscored provides data for a range of leagues and competitions that’s already really good, but further publicly-available Opta data on the Stats Perform website Scoresway.com (the range is incredible: you can find full match data for dozens of countries’ first tier, and in some countries lower tiers as well as women’s football). I’ve put my webscraping and data wrangling code on Github: this collects the match info, stats and event data for a given game. The next step was adapting my template for the match dashboards, since the original one was made based on Whoscored’s source code. This took a while but I’m happy I persevered, because I can now easily create an improved version of the first version of my dashboard (above). I also customized it to make it look better and to draw attention to the relevant event data!