The infancy stage of analytics in Scottish football

written by – Rangers Report   photo courtesy of – Getty Images

What I would say is that analytics in Scottish football is at least ten years behind everyone else.

This was told to me back in 2014 by someone who would know.

Three years later, a community of ‘fanalysts’ [albeit still relatively small compared to other football nations] has pushed statistical analysis into the everyday conversation about football in Scotland.

More & more people are putting their ideas out there, either on blogs or via Twitter.  There have been growing pains along the way, but given the public analytics world is so small in Scotland there has also been a great deal of support for each other’s work.

As the core members of this community has persevered, more & more fans are beginning to find their entry points of using stats to justify (or contradict) their opinions of what is happening on the pitch.

Ok…maybe not everybody.

Even though the interest is growing & there are more & more contributors dabbling in analytics – in Scottish football it is still very much in the infancy stage.  Meaning, there are & will be growing pains.

Sometimes it is when the statistical results don’t match up with the public’s perception of a player or team & most recently it has been the fact that the data is not syncronized amongst those people using it.

As I have made my own data available publicly, one thing that keeps happening is that it doesn’t match the stats provided by Matt Rheinwho is a kindred spirit – albeit with poor taste in football teams.

One example occurred yesterday, when Matt tweeted out his Expected Assists leaderboard that had Hibs’ John McGinn as the leader with 1.69 Expected Assists off of 14 Key Passes (the pass that sets up a shot).

That was odd because when I published my list I certainly don’t remember John McGinn being so high up there.

McGinn was 18th on my list with nearly half the Expected Assists that Rhein has him at.

That’s a pretty major discrepancy.  You can see now what I mean by growing pains in the analytics community.  We are finally getting access to data but the information we are playing with is all over the map.

Some background…I build my dataset around the official statistics provided by Opta’s tracking of matches.  Media outlets like the BBC & MSN pay Opta for the running sequence of events that occur in a game.  The league’s official website also uses the same information provided by Opta.  That’s how you get the total shots, fouls, corners, etc. from each match.  Then, people like myself &, in the past, Matt turn the data into something more advanced.

I made the decision to rely on the ‘official’ data to try to maintain consistency with the shot totals, etc. that were being published league wide.

My main task would be to fix the regular mistakes that the Opta scorers make nearly every week at the different stadiums across Scotland.  This is not an indictment of their work.  I seriously doubt Opta puts too many resources into training & evaluating the data being provided from the leagues in Scotland.  Scoring mistakes aren’t high profile for the SPFL in relation to the EPL or the Bundesliga.

But the mistakes are happening.  The most common ones come from shot location or misidentifying players.

For example, here are the Key Passes that the aforementioned John McGinn has made this year.

Note that the shot taken by Simon Murray against Partick Thistle was tagged as occurring in the “center of the box”

Clearly, Simon Murray is outside of the box & not in the center of the penalty area as he was described being by the Opta scorer.  The Expected Goal discrepancy between the reported shot location & the actual shot location is 0.206 (which represents a 79% difference).  You can see how this minor mistake can skew data.

This happens regularly & I have consistently adjusted accordingly.

I do know that Matt gets his data from Stratabet & I have no knowledge of their vetting process.  Are they catching these issues?  Do they have a separate scorer at each game (or virtually tracking it)?  Does John McGinn have more Key Passes in Matt’s data because the Stratabet folks are counting shots that the official Opta scorer isn’t?

Update:  Dave Willoughby from Stratabet reached out to clarify the questions raised here.  They do run a quality control vetting of the data they provide & likely are catching the mistakes that Opta is not catching.  Additionally, the matches are tracked by video & they do track shots, chances, passing, etc. independently from the Opta classifications.

This is all part of being in that infancy stage for analytics in Scotland.  The demand for this accuracy is coming from a small handful of people – but imagine if this was happening in major competitions like the Champions League, the EPL, or even the World Cup.

As analytics continues to evolve, don’t you  think agents will be cherry picking specific data points to negotiate on behalf of their players?  It would be meaningless to do so unless the stats are uniform, regardless of the source.

These inaccuracies are not unique to Scottish football.  It actually reminds me of some of the struggles that have occurred in the earlier years of the hockey analytics movement, which also relied on official feeds of data.  Like in the example linked above, nobody seemed to vet the accuracy of the data provided until people actually began playing around with it.  Given how vital NHL analytics has become to the sport you can assume that whomever is tracking the games for the official data feed is under much more scrutiny to provide accurate information.

As more & more companies bypass the official scoring of games, it’s going to be critical for those companies to be transparent about how they get their data.  There has been a burgeoning private data tracking community in hockey & given how tightlipped they are about their process – some skepticism has emerged.  (Update:  please note this is not directed towards Stratbet but rather each & every private stats company [including Opta].)

Back in 2016, Pension Plan Puppets raised this issue after John Chayka, the founder of a private stats company – Stathletes, was named General Manager of the Arizona Coyotes.

Stathletes’ methods and the information they track is mostly unknown, and understandably so. As a private company, it’s in their interest to keep any competitive advantage they have under wraps.

We know they collect data, and lots of it. How much of it is useful? How much of it relates to possession, or winning? Are they reliable or random? The answer to each of those is “we don’t know.”

It’s easy to roll your eyes at the idea that ideas need to be ‘vetted’ by hobbyists on Twitter, but peer-review is a thing for a reason and some of the best hockey minds in the world are currently working in the public sphere. They’re more than capable of assessing whether someone is legit, or full of it. We don’t know what bucket Stathletes falls into.

Additional comment:  Opta does an excellent job of putting their analysis of data out there in the public forum.  The point I’m trying to make is that there appears to be ZERO quality control & vetting of the stats they provide the media (specifically the Scottish media).  It’s not the fault of the people tracking the play in realtime, it’s the fact that mistakes are not being caught & amended after that live publication of the match events.

The good news is that the analytics community in Scotland is truly growing day-by-day & the public vetting of each other’s work is happening out in the open.  This process has helped each of us grow & improve our work.

In a few more years, the hope is that the community will continue to expand & the work will continue to be made public.  As that occurs, there will be a natural demand to make sure the data people are basing their work off of is uniform so the results can be accepted as facts.

But to put all of this in context (& to do a little sales pitch), you have to realize that the data I’ve collected is meant to be played with.

It’s meant to be an entry point to new ways of looking at the game.  This sharing of extensive football statistics is not the norm for any other league.

I dare you to find a database this extensive for the EPL or for La Liga.  This isn’t meant to be bragging, it’s more of a question of why isn’t it available for other leagues?  If the stats continue to be controlled & seen by only a select few…the new ideas may begin to dry up.

Oh yeah…before I go make sure to check out the work of:

  • Fitba Fancy Stats, who literally was the only person publishing this kind of work covering Scottish football back in 2014
  • Matt Rhein whose  analytics blog, The Backpass Rule, has consistently set the bar of quality in the Scottish analytics world
  •  Dougie Wright – who has a real natural ability to make his analysis accessible & easy to understand for the reader which explains why his work has become so well known.  I think I was his 30th Twitter follower (he’s now up to 6,500+).
  • The SPFL Radar, Christian Wulff, Alan Morrison are also must reads & follows on Twitter.
  • The Two Point One team have also pushed stats to the fore of their coverage of the Scottish game in a way that is going to likely pressure the mainstream media to get with the times before people realize how vapid much of the mainstream coverage is.
  • Most recently Kyle Ensign has applied a DIY approach to analyzing Rangers matches.  Make sure to check his work out while you can because I have a feeling he won’t be on the market too much longer.  I’m sure teams are noticing how quickly his work has evolved in a very short amount of time.

My apologies in advance for overlooking anyone else.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s