Over the summer, I did a pretty in-depth data scouting project focused on my favorite club, namely Manchester United. Below I provide a summary/overview of my approach and some selected contents/results; the full articles are on my personal football/Man Utd blog:
- Man Utd 2023 Squad Rebuild & Recruitment: Part 1
- 2023 Squad Rebuild & Recruitment: A New GK
- Man Utd July Transfers/Scouting Update (1)
- Man Utd July Transfers/Scouting Update (2): RCB
Part 1 is a general assessment of where the club was at after Erik Ten Hag‘s first season as manager/head coach. I include a general review of the 2022/23 season, as well as a critique of the longer-term issues at the club, many of which Ten Hag still has to cope with (and arguably it’s holding him and the team’s progress back). I identify the gaps and limitations in our squad as the summer transfer window was about to open, and outlined my thoughts on the “ins” (recruiting new players) and “outs” (letting go, selling or sending on loan current players) strategy that should be adopted. Lastly, I include a tactical and data analysis about two players that I wanted to be replaced: the then number one goalkeeper at the club, David De Gea, and academy graduate Scott McTominay, who’s a midfielder.
I reproduced the analysis and methodological reflection concerning goalkeepers and De Gea on this website (see here), because it’s a very relevant topic for football scouting methodology. The flaws of the main statistical methods/data for analyzing goalkeepers’ performance is a big issue compared to data available for midfielders and forwards (there is a similar problem about defenders and especially centre-backs, as Ben Griffis explains here). Without going into it in more details, I used data from FBref, Statsbomb, Wyscout, Smarterscout and John Harrison‘s “Goalkeeper xG model”, as well as my own – and some other analysts or fans’ – observations of De Gea’s performances throughout last season, to explain why replacing him was my number #1 priority.

More than anything else, the issue with De Gea is his almost complete inability to defend and prevent goals proactively, as well as his distribution/passing. They’ve been frankly mediocre for many seasons, as illustrated in this chart (the colored curves are tendency lines for each metric).

As you can see in the above line chart (look for the orange line), and as illustrated starkly in this crosses map from Statsbomb, De Gea’s cross-stopping is extremely poor, and has been so consistently over the years. This chart from Smarterscout shows a decline in De Gea’s overall shotstopping and shot-preventing ability since his peak during 2017/2018:

The second entry in this series is a scouting report outlining which goalkeepers I thought would be fitting targets to replace De Gea. I should here introduce a dataset I’ve used throughout this series, which I compiled from FBref thanks to Ben Griffis’ webscraping Python code (here and here). The webscraper downloaded FBRef’s raw data for 11 competitions (the respective first-tier leagues in England, France, Spain, Italy, Germany, Mexico, Brazil, the USA, Portugal, the Netherlands, and the Championship, i.e. England’s second tier league) for the 2022/23 season, and saved it in csv files. I then had to do quite a lot of data cleaning and wrangling, including completing the players’ specific positions by using Jason Zivkovic’s worlfootballR to scrape the positions included in Transfermarkt‘s player pages. The reason doing was necessary is that I was thus able to create another spreadsheet with percentile ranks among positional peers: for example, you can therefore find a centre-forward’s percentile rankings for any given metric – such as expected goals per 90 – in comparison to all other centre-forwards in these 11 leagues, for 2022/23 (it wouldn’t make sense to compare a centre-forward’s performance metrics to those of a centre-back).
Back to the goalkeeper scouting report, here’s how I proceeded, with the important caveat that as mentioned above, standard goalkeeping metrics are pretty limited and should be used with a big pinch of salt. I used previous reports about Man Utd’s GK targets, some friends/analysts’ own shortlists, and a couple of simple “filter and sorting” manipulations on Excel, to narrow down a list of interesting options. At this stage I focused on stats about on-the-ball ability since shotstopping and other defensive metrics for GKs are flawed: I reviewed percentile rankings for live-ball (open play) passes per 90, completed passes per 90 and passes launch percentage (%).

From the report: “Pass Launch% is relevant because a lower percentile rank for it indicates that the GK is willing to play on the ground, although obviously this is highly dependent on team tactics” (some names here are included merely as reference, e.g. Courtois or Alisson)
I invite you to read the report for my detailed reasoning, but I narrowed down my GK shortlist to 10 options, my favorites being David Raya (who was at Brentford, and has since joined Arsenal), Diogo Costa (Porto), Andrew (Gil Vicente) and Bart Verbruggen (who was at Anderlecht, and has since joined Brighton). At the end of the report, I included data visualizations and notes for each of the 10 options selected, for example this is Andrew (da Silva Ventura)’s “pizza chart” based on his FBref percentile ranks:

Funnily enough, in the end, Man Utd recruited André Onana, one of the few options that I had written off as too unlikely/unrealistic; but he’s the perfect type of goalkeeper for this team, as Tifo IRL and Football Meta have explained brilliantly on Youtube. The third part of the series contains a brief note on Man Utd’s academy players, followed by a scouting report for the second key position (according to my blueprint introduced in the first post), namely a new centre-forward. I used two main data methods/sources: my own database of percentile ranks based on FBRef, and Ben Griffis’ “Football Prospect Research” tool which uses Wyscout data to explore players’ stats from lesser-known leagues across the world. I usually filtered out any player from rival teams in England or big clubs in Europe (e.g. Real Madrid, Bayern Munich); apart from Mehdi Taremi (who’s 30yo) I kept only players who were (in 22/23) 25 yo or younger; and I removed players with less than 5 90’s (and I kept a skeptical eye on any player with less than 10 90’s, because that can lead to misleading data/rankings). I reviewed many different metrics, about all phases/dimensions of the game (creation, shooting, passing, possession, defending…).

I narrowed down to the following shortlist; I should note that I deliberately tried “thinking outside the box” and looking for cheaper targets. In the end, Man Utd paid a large fee for Rasmus Højlund, a hugely promising 20yo Danish striker.

Out of these targets, Scammaca went to Atalanta (Bergamo, Italy), Douvikas to Celta de Vigo, Archer to Sheffield Utd, Retegui to Genoa, Beltrán to Fiorentina, Danilo to Rangers. At the end of the report, like in the previous one, I included some charts, this time using FBCharts (a browser extension automatically rendering radar charts on FBref player pages) DataMB Radar charts (Wyscout data). Here’s Vangelis Pavlidis for instance:


Using a similar methodology, in the fourth report I explored RCB (right-sided centre-back) options, again using my FBref-based database. I was looking for a ball-playing centre-back who would partner well with Lisandro Martinez (LCB), and thus needed to be very good in possession, defensively and in terms of physicality and aerial duels. A preliminary list was drawn, although I reduced it to 10 options (see the full report for specific reasons/arguments):

Final list (order isn’t very relevant apart from the first two which were my top/favorite targets):
- Ousmane Diomandé
- Jean-Clair Todibo
- Mees Hilgers
- Edmond Tapsoba
- Kevin Danso
- Nathan Wood
- Konstantinos Mavropanos (has since joined West Ham)
- Alexsandro Ribeiro
- Ben Wilmot
- Ben Cabango
Finally, the customary radar charts were created, but here I’ll only show examples of charts based on the data from my own FBref database, as opposed to using the online player reports on FBref’s website:



TBC [Conclusion/critical review]