From most people’s experience playing and watching soccer, certain things seem to be truisms. You have to shoot in order to score. Shorter passes are easier than longer ones. One of these fundamental soccer truths is finishing ability. It’s a near universally held belief that some players are simply better at scoring than others. These players take the same chance that a lesser finisher would sail over the bar and tuck the ball into the far corner instead. However, it’s been proposed by some (see this article, for example, as well as Mike L. Goodman’s extremely thoughtful piece for a discussion of this) that finishing skill may not be as persistent a trait for a given player as intuition leads us to believe. I gathered some data  in the hope of exploring this question.
It’s really tough to get a precise piece of data which quantifies finishing skill. Factors such as location, defensive pressure, goalkeeper positioning and assisting pass all affect the difficulty of scoring a goal. Privately licensed data contains shot by shot detail, with details about all of these confounding factors in order to control for as much as possible. However, publicly available data is sparse  and much less granular than would be required for that complete an analysis. Instead, I looked at data from Squawka.com, and collected information for 97 players across two seasons. The focus of analysis here will be on conversion rates, defined to be the goals scored by a player divided by their number of shots. Though this is an imperfect metric, it’s the best one widely available (to my knowledge) without doing some tedious scraping of individual match shot charts. My hope is that it serves as a good enough proxy for true finishing ability, in the presence of a few controls. My data include players who played in the so called “Top 5” European leagues  over the past two seasons and scored at least 10 goals in one of those years. I further filtered the data to include only players who had played at least 1000 minutes in each season, in order to try to keep conversion rates from being skewed by a small number of shots. Let’s take an initial look at how conversion rates changed over the course of the two seasons .
On first glance, there’s not much of an obvious correlation. If conversion stayed perfectly consistent from year to year, we’d see all the points falling on the dotted red line, but as it is, 2012’s conversion doesn’t seem to persist in the next year. The results of a simple linear regression bear this out. The estimated effect of the previous year’s conversion rate on the next year’s is positive, but very weak, and not significantly different from zero. There are definitely some confounding factors though, so let’s consider what else might be driving conversion.
Though there are lots of things that could affect how a player is converting opportunities, I considered only the quality of the player’s shots, and whether or not the player had changed teams between the 2012 and 2013 seasons. The potential effect of shot quality on conversion rate is pretty plainly apparent. If a player took shots only from outside the penalty area one season, while in the next season they shot only from inside the area, we’d expect a change in conversion rate completely independent of that player’s ability. Changing teams, on the other hand, could influence a number of environmental factors such as style of play and quality of teammates, all of which could have an effect on conversion.
To examine the effect of shot quality, I used the proportion of a player’s shots which came from inside the penalty area. Though this is a bit of a rough stand-in for shot quality, it’s the best statistic offered by publicly available data. A chart of conversion rates against shot quality shows a visible positive relationship between the proportion of a player’s shots which came from within the box and that player’s conversion rate.
This chart includes data from each season for all players in the dataset. It’s reasonable to suspect that shot quality as measured by this metric drives conversion rate, and would be a variable to control for in trying to estimate the persistence of conversion rates.
The effect of changing teams can be visualized by breaking down the first graph into players who did switch teams, and those who didn’t.
The trend lines here show that the players who did not change teams had their conversion rates persist better than those players who did change teams. This makes sense, given that changing teams can shift a ton of factors at once, any of which might have a material effect on conversion. In the case of players who did not change teams, the positive effect of the previous year’s conversion rate on this year’s is significant at the 0.09 percent level, while the effect is insignificant for players who did change teams.
It’s clear then, that shot quality and a change in teams can both effect how a player converts their opportunities. In order to estimate the persistence of conversion rate with these factors taken into account, I used a basic technique called multiple regression, which is essentially an extension of a linear fit or regression, but with more than one independent variable. It attempts to calculate the effect of 2012’s conversion rate on that of 2013, but controls for shot quality in 2013 by estimating its effects as well. Because team changes introduce so much noise, I limited my regression here to only players who had stayed on the same team across the two seasons (65 players in total). The results are below:
|Variable||Effect on ’13 Conv.||Significance|
|’13 Shot Quality||0.007||0.003|
The “effect” column for the non-constant parameters indicates the change in ’13 conversion rate associated with a five percentage point change in the variable. So, for example, increasing shot quality (as measured by proportion of in-area shots) from 0.60 to 0.65 would be associated with a 0.7 percentage point increase in ’13 conversion, with ’12 conversion held constant. The end result is that even with the relatively noise-free dataset of only players who have stayed with their team over the two seasons, the effect of shot quality on conversion renders any persistence in conversion rate meaningless.
So is finishing a myth? I’m not sure I’d go that far. First of all, this analysis is somewhat incomplete. Conversion rate isn’t the best way to measure finishing ability, and there’s a lot of factors impacting results which aren’t controlled for even after taking into account shot quality (also as measured by an imperfect metric) and team changes. Also, having several seasons of data would provide more evidence to establish a trend (or lack thereof). There are a few conclusions, however, which this analysis does support. One is the fact that finishing is subject to a number of factors having nothing to do with the innate ability of a striker. A player could have an outstanding season in front of goal but maintaining this performance over subsequent seasons is far from certain. Furthermore, conversion rates might be used as an indicator of candidates for regression towards the mean for certain players. It should not be surprising when those players who have goal tallies for a single season supported by extremely high conversion rates struggle to repeat their goalscoring performances in the following seasons. On the other hand, a player who generates a decent number of shots and chances, but has a low conversion rate might be suffering from some bad luck and come good in a later season  . It will be interesting to see if this kind of regression to the mean in conversion percentages can explain sudden drops or jumps in goalscoring in the future.
 Easily the most frustrating barrier to entry on any idea I have for analysis is available, reliable data. I know this is an issue for many, and it’s something which is even more evident having worked on a project with access to Opta data and now going back to manual spreadsheet entry. Objective Football is a new site to keep an eye on in terms of tracking some newer metrics. The ImportHTML function of Google Sheets may help you scrape the site.
 EPL, Bundesliga, La Liga, Ligue 1, Serie A
 In exploring different Python libraries, I also created versions of each plot using Plotly’s Python API, which you can find here, here and here , along with an IPython notebook which walks through creating the rest of the plots in Python and ggplot
 This example carries the caveat of shot quality. That is, players with high shot totals who rely on long distance shots, e.g. Adel Taraabt or Andros Townsend, probably wont see much of an uptick in conversion.