In a recent Premier League football match against Tottenham Hotspur, Sir Alex Ferguson, manager of Manchester United, complained that his team had been short-changed by the amount of injury time awarded. It seems that Ferguson was particularly irked because of his team's apparent propensity to score late or last-minute goals. To add to the conspiracy, a Guardian report from September 2009 suggested that "Manchester United get more injury time when they need it". In particular, a review of the data from start of the 2006/7 season to September 2009 "discovered that, on average, there has been over a minute extra added by referees when United do not have the lead after 90 minutes, compared to when they are in front. In 48 games when United were ahead, the average amount of stoppage time was 191.35 seconds. In 12 matches when United were drawing or losing there was an average of 257.17sec."
Last-minute goals are good for business of course - a strong likelihood of a late goal will keep fans in the stadium, eyes glued to the live television feed, and the betting markets ticking over. Sport, as with any other form of entertainment, needs to find ways of building tension and retaining interest to the end of the show, and an uncertain outcome is one of the best ways of all. But assuming the game was a fair one, would it make sense for us to expect that Man U would have scored another goal had there been a minute or two more of extra time in their match against Spurs?
The notion of "data analytics" and "data science" are all the rage in the business community at the moment, based on the desire to start finding value or advantage in the huge amounts of transactional data collected and generated by computers every day. Data analytics in sport also has a growing number of fans, in part popularised through films such as Moneyball, based on the book of the same name by Michael Lewis that told the story of how player statistics could be used as a part of a winning scouting strategy in US baseball. So is there a "football analytics" community that may be able to help us answer our question?
Now I may not be much of a football fan, but tracking down datasets is something of a sport to me, so I thought I'd have a quick "over-lunch" look around to see what sort of soccer stats-related data releases might be available. There was one caveat though - the data needs to be: 1) free (as in cost); and ideally 2) openly licensed (that is, not restricted by intellectual property rights such as copyright or database right, in its use).
The first thing I came across was a great initiative launched by Manchester City over the summer to release a wealth of match day data that could be used to visualise the action during a particular game - Manchester City data analytics. A commentary piece by data partners OptaPro provides a great background to the rationale behind the data release and some of the intellectual property law considerations associated with it. (Nothing is simple in IPR land, not even the publication of football fixture listings...!) The data is very detailed, the datasets very large, and it's probably all very interesting. But it's just a little too detailed, and currently only covers Man City games...
A post on the Arsenal website included a summary table of late goals in the Premier League from 2009, but no clues as to where I could get more of the same... However, a 'recent post' that was linked to at the bottom of the page (Fri, Nov 23, 2012 Gunners Gaming - Betting Preview) made me wonder about the betting sites. Do they have data maybe? (Bear in mind the caveat - it’s free and ideally openly licensed data I'm after.)
Searching for betting sites is not something I'd normally do, but in the name of research I ventured forth, through blinking banner ads and "free bet" offers that are probably way too good to be true... At one point, I thought I'd struck gold: the football-data.co.uk website has links to Premier League football data downloads, but... no data on late or last-minute goals, or the amount of extra time awarded. (If we just had goal times, we could work out whether they were late, last minute, and/or in extra-time...) The site did have data on half-time and full-time scores, as well as the number of corners, shots on or off target, fouls and yellow or red cards. But not the times when goals were scored.
With a minute to go before the end of my lunch break, another lucky strike: 11v11 "The Home of the Association of Football Statisticians". Could this be it? There's a link for /Premier League/, a link for /Manchester United/, a link for /Matches/, a link for each season, and each match by season... a match report, /with/ goal scorer and time... and a link for data files anywhere? Anywhere...? The clock ticks on, time's up. Close...but not quite there.