Using stats to identify the best center backs in the English Premier League

written by – Rangers Report    photo courtesy of – Getty Images

This is part two of my statistical study of center back play from the 2016-17 English Premier League season.  All data was collected from FourFourTwo’s StatsZone   & whoscored.com

My ultimate goal in this project was to develop a defensive rating system for center backs.  A rating system that could be used as an entry point to evaluate defensive play, recruit players & even identify trends in opponents.

The rating would combine shot suppression with how often defenders were actually doing stuff (passes in the defensive third, tackles, interceptions, & clearances…you know – defensive stuff).

I leaked out results via Twitter as I finished tracking different teams.

I was excited by the results.

Ahhh…the stat formerly known as the DPSS (Defensive Possession & Shot Suppression) Rating.

I was all ready to go….but first I wanted to set the stage by using the data collected to determine if having a left-footed center back, paired with a right footed center back was an overrated phenomenon or an actual priority managers should seek out.

That was Part One.  The results proved that there was no real reason to go out of your way to pair a lefty with a righty & that when two right-footed center backs were paired together – the results were basically identical to a ‘balanced’ pair of center-halfs.

But….my plan was flipped upside down by the following facts:  in the 2016-17 season, 56% of shots, 55% of scoring chances (more on that later), & 62% of the goals came from shots taken on the right side of the pitch against the left side of the defense.

I was expecting closer to a 50/50 split, but 56% of the shots coming from the right meant that only 44% were coming from the left (pretty fancy math, huh).

This kind of blew my mind…(I know, I know… ‘get a life’….whatever)…..but ultimately, I was looking at three months of work going down the drain.  How can I justify using a rating to assess center backs when one side of the pitch sees 12% more of the shots?

Then the sleepless nights started….

Playing as a left sided center back was a more difficult job then playing on the right.  You’re more likely to have shots taken from your zone of influence & way more likely to concede goals from your side of the pitch.

Thankfully, with the support of Matt Cane & Ryan Stimson, via some frantic emails (mostly frantic on my end), I was able to revise my plan of how I was going to use this data & apply it to a defensive rating system.  The work of Cane & Stimson have been huge influences on this process & in particular, this post from Cane’s blog Puck++ is what kickstarted this project.

Before we get to the findings, let’s quickly review the process (first introduced in part one).

SPOILER:  Before I lose you….the five highest rated defensive center backs in the EPL last season were:

Arsenal’s Shkodram Mustafi

Burnely’s Michael Keane (who just signed with Everton for £30 million),

Everton’s Ramiro Funes Mori (when healthy)

Phil Jones (Manchester United)

Steve Cook of Bournemouth.

Some surprises there?  Phil Jones, really?

Let’s see how the stats played out…

To get a sense of shot suppression, we need to know where the shots are coming from.  So I tracked the shot data from each EPL match from the 2016-17 season.

courtesy of FourFourTwo’s StatsZone

I split the pitch in half & recorded which half of the pitch the shot came from.  I only included shots that started within the frame.  Notice the blue shot (a shot that was on target) & the yellow shot (a goal) originated from outside this frame & were not counted.  Shots from that far out were less likely to be impacted by a center back.  Make sense?

So in this example, there were two shots counted & both came from the attacking right side.  Given that this is the point of view of the attacking player, realize that those two shots came under the zone of influence of the left-sided center back (see it is harder to play on the left side).

As the data adds up you can see what percentage of shots came from either side of the pitch.  Like I mentioned in the previous post, this isn’t perfect.  Football is a fluid sport.  Sometimes a left sided center back is caught up on the other side supporting his partner.  Often a breakdown occurs elsewhere that the center back has no control over.

But, over hundreds & thousands of minutes some definitive trends occur…these trends should be learned from & can bring some real insight to the effectiveness of the different center backs in the league.

Now, for the twist….remember one center back has a more difficult task then his partner.

courtesy of Getty Images

Take Liverpool’s Ragnar Klavan as an example.  Klavan is a left-footed center back who played nearly 1,500 minutes as a left-sided center back.  When he was on the pitch, 56% of the shots against came from his side, meaning 44% of the shots came on his partner’s side.

In my original rating system, this hurt Klavan.  He was not suppressing shots as well as his partner & as a result the majority of scoring chances & goals were also coming from his defensive zone of influence.

But remember, I tracked over 9,000 shots & the data showed that 56% of shots came from the right side (against left sided defenders).  Couldn’t you argue that Klavan was performing as expected relative to other center backs playing on the left side?

This is where finally found my zen moment.  Why don’t I just show a defender’s shot suppression stats relative to the rest of the league.  This has become very common in the hockey analytics world, as players’ stats are evaluated in relation to other players (usually their teammates).

Back to Klavan.  His Relative Shot Suppression Rate (RelSS%) is 0.00.  His 56% is no different than the rest left-sided center backs in the EPL.

Another example is Sunderland’s left sided center back Papy Djilobodji.  Before my moment of clarity, I was pretty blown away by how bad  Djilobodji was at suppressing shots.  When he was on the pitch, 63% of shots came on his side & only 37% on his partner’s side.

I bring up the partner’s rate because remember my ultimate goal was to create a defensive rating.  Because I ultimately want to combine two positive numbers, I used the percentage from the partner’s side.  If a player never allowed a shot on his side of the pitch – what the hell would you gain by starting with 0% as a number…100% would be a better number to work with.

Back to Djilobodji (thank you copy & paste) — 37% of the shots came on his partner’s side & the league rate for a left sided center back would be 44% of the shots coming on the opposite side.

So…Djilobodji’s RelSS% is -0.07 (0.37 minus 0.44).  That’s still pretty bad…actually it’s among the worst rates in the league last season.

The best, you ask???

I used 1,000 minutes played as the minimum here to avoid the shock of seeing Daniel Ayala or Nathan Ake in the top 10…that’ll come in a few minutes.

When it comes to shot suppression, you see that there’s Nicolas Otamendi & Virgil van Dijk & then everyone else.

Otamendi basically split his time playing on the left & right last season.  When on the left, his RelSS% was 0.04…which is very good.  When he was on the right, his RelSS% was 0.18 – which is off the charts good.  As a right center back, 74% of the shots came on his partner’s side – when you subtract the league average of 56% from that you get Otamendi’s Relative Shot Suppression Rate of 0.18.

Van Dijk spent 95% of his minutes as a left center back for Southampton & the majority (54%) of the shots came on his partner’s side…normally that rate is flipped…which explains why he is so in demand during this transfer season.

Now, remember my original goal was to create a defensive rating for center backs.  This is merely an attempt at measuring defensive impact by combining the player’s RelSS% with the aforementioned defensive stuff they do:  completed passes in the defensive third, clearances, interceptions, & tackles won.  I combined those defensive possessions (much better term then stuff) & averaged them out on a per 90 minutes basis & multiplied that rate with the RelSS%.

Below are the top 25 center backs based on that defensive rating, this time with a minimum of 700 minutes played.

*Note that the RelSS% was rounded up for your sanity & that’s why someone like Gareth McAuley’s rating doesn’t look like it matches up.  His actual RelSS% is 0.0148709.

Remember my spoiler from earlier about Mustafi & Keane being the top rated defensive center backs last season?  Where are they on this list?  Mustafi is 22nd, while Keane is 58th!

Awkward??

Rating center backs on shot suppression is one thing, but we all know that not all shots are created equal.  To me, one of the qualities of a good center back is, if they are going to allow a shot, to force that shot to the outside into a less dangerous area.

Basically, can they limit the amount of scoring chances that come from the shots they do allow on their side of the pitch.

Scoring chances are any kicked shots from inside that red, shaded area & also are headers that come from within the vicinity of the six yard box.  All other shots have been ‘forced outside’ of the danger area & are not considered scoring chances given the low scoring rate of those shots.

Below you’ll find the top Relative Scoring Chance Suppression Rates (RelSCS%) from last season’s center backs.  It’s the same concept as the shot suppression rate…the only differences are that only scoring chances are considered & the league rates are 55% of scoring chances come from the right & 45% from the left (from the point of view of the attacking team).

Again, being a left-sided defender is more difficult than being deployed on the right.

1,000 minute limit

There didn’t seem to be much love for Mustafi’s first season with Arsenal but the fact is that opposing teams really struggled to create quality scoring chances on his half of the pitch.

In his nearly 2,200 minutes, Mustafi only allowed eleven scoring chances from his side of the pitch.  There were 28 scoring chances coming from his partner’s side.   That equates to 72% of the scoring chances coming from his partner’s side – the league average for players on the left (where Mustafi’s partners played) was 55%…that translates to a RelSCSS% of 0.17.

Keane played over 3,000 minutes for Burnley & had similar rates.  Keane saw 22 scoring chances from his side of the pitch, while his partner allowed 53.  Even though Keane allowed twice the amount of scoring chances, he still only allowed 29% of them to come from his side of the pitch.  Keane was under much more pressure than Mustafi, but they both did a much better job than their center back partner.

Here are your top 25 rated center backs, based on their combined Defensive Possessions per 90 with their RelSCSS%.  The minimum is 700 minutes played.

Remember numbers are rounded up.  For example, Luiz’ RelSC% is 0.0251449, while Delany’s is 0.0272891.  That’s how they end up with the same rating.

  • Denayer played limited minutes, but when he was on the pitch only 40% of the scoring chances were on his side. Given that he played 81% of his minutes on loan from Manchester City as a left-sided center back – his job was harder.  That is why his RelSCS% is so high.
  • You’ll notice that some players are ‘rated’ higher than some even if their RelSCS% isn’t higher. For example, Phil Jones is rated slightly higher than Steve Cook despite a lower RelSCS%.  The difference is that Jones averaged 26.57 defensive possessions per 90, while Cook averaged 23.58.  Jones had a little more defensive stuff to do in order to limit the chances coming from his side.
Phil Jones

Is this a definitive list of the best center backs?  No…but it’s a conversation starter based on facts.

The stats aren’t perfect, but they are entry points to further analysis.  When I did a poll of who were the best center backs in the EPL, only 28% of the 195 ballots said that Mustafi was among the top 10.  Only 24% had Otamendi in the top 10 & Phil Jones only was on 7% of the ballots.

There was a significant push for Laurent Koscielny to be included.  He’s a player I’ve always regarded highly as a defender.  Most observers have.

His Relative Shot Suppression Rate?  -0.01.

His RelSCS%?  -0.10.

Only five center backs who played more than 1,000 minutes had a lower RelSCS%.  When Koscielny was on the pitch for Arsenal, 63% of the scoring chances came on his side.  Even though he played the majority of his minutes on the left side, which is a more difficult task….remember that RelSCS% takes that into account.  You can either dig for excuses….or ask yourself why?  Why are teams having success exploiting Koscielny’s side of the pitch?

Let’s stop for now….there’s a part three & maybe even a part four coming up.  Next up:  we’ll share the results from each team.  That way we can see who are each team’s best (& worst) defenders & who deserves more (or less) playing time).

Here’s a couple of samples:

All of the data from this study will be released once all the posts have been written.

Advertisements

6 thoughts on “Using stats to identify the best center backs in the English Premier League

  1. Surprisingly good read.

    One thing the Ice Hockey stat-heads going for them (at least now) is institutional buy-in from the clubs and fans alike. One of the best sites was so influential, the founders/writers all got hired by clubs and were basically forced to shut down their site. Fortunately, a lot of the work lives on. See link below.

    Two of their big contributions were to set a framework to evaluate something known as ‘wins above replacement’ which is a fancy way of measuring how good a player is relative to what would be considered a replacement player (for the NHL that would be a minor leaguer; for EPL, probably the 24th or 25th best player on each team). Crucially, they measured the impact of each player on expected goals for, goals against, and resulting likelihood of won/loss ratio.

    The other big contribution, and I don’t think the idea originated with them, is ‘with you or without you’ metrics to see if the team was better with that individual player in the lineup or not. And again, this was measured using expected goals for/against.

    http://blog.war-on-ice.com

    Lastly, regarding Koscielny, it’s pretty clear that all the CB at Arsenal are inferior to Mustafi. Since you’re using relative metrics (relating one player to other players on the same team), and since Mustafi is the only one in positive territory, the only thing we can really conclude is that Mustafi is better than Koscielny. But what if they’re both above average, relative to CB’s from other teams, and on the field at the same time? How would we see that, and what would we expect to see in the stats?

    Lastly, one of the best attributes of CB’s is their ability to prevent shots. I think you’ve taken a reasonable attempt at measuring this, but what if instead of shots going to the other side of the field they simply disappear as the opposition passes the ball back time after time – like the LVG offensive experiment at ManU? Measuring that, and I don’t even know how one would go about it given that opponents have vastly differing possession and shooting philosophies, would be hugely insightful.

    Like

    1. Ben

      Thanks for the thoughtful response. War-on-Ice was one of the blogs that really got me into analytics. It seems like a few summers ago was the huge break through for it as team, after team hired people from the online community. Some of it was simply PR for teams, while others have embraced it to a greater extent.

      I’d love to figure out a WAR stat for soccer & have always wanted to to WOWY stats for attacking players…it all comes down to creating a systematic approach to tracking those details

      Measuring defensive impact is a challenge – how can you track what didn’t happen because of solid defensive play. It’s a tricky code to crack.

      I agree about the downside of having a relative stat when you may still be very good, but not as good as your teammate.

      I thought about including shots allowed per 90, but that may have less to do with the defense & more to do with the amount of possession the team in front of the CBs has.

      Thanks again for the thoughtful response

      Like

  2. Found this a really interesting piece however I don’t think it takes into account the set up of the team.

    Particularly in the Koscielny example. Arsenal had Alexis on the left for a majority of the season meaning lack of defensive cover, so a large number of attacks come down Arsenals left hand side. This also means that the danger play is more likely to be on Kos side of the pitch, not sure how you could account for that other then to look at % of chances stopped on each side rather than just taking the number allowed.

    I would also argue that Kos had to cover more often for his CB partner as they were generally inferior and therefore we were left vulnerable. I understand that is almost certainly a bias opinion but would be interested to see some actual facts to prove/disprove it. Mustafi often lunges in Kos goes to cover leaving his side more vulnerable…

    Football being a team sport makes it very hard to compare player to player even in the same team however if you look at the team as a whole I think we are more vulnerable down our left as the defensive cover starts at full back and Center back. And it has been that way since Overmas and Winterburn, through Cole and Pires right up until now with Alexis and Monreal.

    So team dynamics will always play a part as to who is exposed the most surely? And this is clearly not the fault of the CB in question.

    Like

    1. Stewart

      You’ve highlighted the issue with a stat like this…part of the reason defensive stats are so hard to truly trust is that there are so many variables in play. Offensive stats, easy. Shots & passes are events we can see with our eyes & support with data.

      To me, the defensive stats I’ve tried to come up with here should be entry points for further analysis – which is what you’ve done. A red flag has been raised, further analysis can determine how worried to be about that red flag. It’s certainly something I’m going to look more closely at when I watch Arsenal this season

      Like

      1. I should just reiterate I did find it really interesting. I think stats are really important in football because it is a game driven largely on opinion, it’s a team game so not easy to pick one player out over another.

        In a lot of circumstances I think the wrong player gets the credit. From an attacking point of view Bergkamps assists were relatively low for the role he had in the team, however if you look at the amount of times he played someone through only for them to slide it to someone else for a tap in. Bergkamp was a key cog in the machine yet assist and goal stats would not necessarily back that up, with players like Ljungberg and Pires showing greater assists and goals getting a greater share of the limelight.

        I would be really interested to see (if anyone keeps the stats) how many times Arsenal have conceded when Koscielny is covering because Per/Mustafi/Gabriel have lunged in, as they tend to do, only to get beaten leaving Koscielny exposed. Another thing to consider is how often the shots conceded by these defenders comes down to one on ones is it easy for Mustafi to win the ball because he has the worlds fastest right back and koscielny covering for him, where as Koscielny has been left one on one or 3 on 2?

        It’s a bit like goalkeepers for me, I have since very early on in cech’s Chelsea career said that I do not rate him, partly this was down to the fact I felt you could put a relatively average goalkeeper in goal behind the best defence and they would likely concede very few goals. For the opposite reason I don’t think Arsenal Keepers are given enough credit we often leave the keepers exposed to high quality chances through bad team defending resulting in goals conceded. Normally the defence or keeper get the blame when actually it starts from the forwards not closing down quickly enough and the midfield being too high up the pitch.

        Any stats looked at in the round for what they are showing are always interesting, as long as there is perspective.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s