Increase font size Decrease font size Reset font size

PREDICTING GLORIOUS UNCERTAINTIES

By Muhammad Jehangir Amjad 2018-11-18
Remember the now-famous no-ball bowled by Jasprit Bumrah in the ICC Champions Trophy 2017 final? Pakistan`s Fakhar Zaman, then only on 3, was caught behind off Bumrah`s bowling, only for him to be judged not out because of the noball. Zaman went on to scoæ 114.

In the wake of the æcently concluded Asia Cup and in particular the two damp-squib contests wheæ Pakistan were swatted aside by India without bæaking a sweat, you may be wondering if the outcome of the Champions Trophy final would have painted an altogether different tale had Bumrah not overstepped in the fourth over of the Pakistaninnings. Youcertainlywouldn`tbealone.

The quest for answers to questions like this gave birth to CricketML, (http://cricket.mit.edu) a collaboration between researchers at the Massachusetts Institute of Technology (MIT) and Columbia University which aims to address a wide variety of questions in the game of cricket through modem statistical and machine learning lenses. The aim is to statistically capture cricketing `wisdom` and intuition. We beginby attempting to use `context` of a game to forecast the entire future of an innings and by providing an alternative to the Duckworth-Lewis-Stem (DLS) method. Led by researchers from Pakistan and India, whose teams remain bitter rivals on the field, we are proud that our collective love for the game has allowed us to merge our passion (cricket) with work (research).

But for now, back to Bumrah`s no-ball. In imagining that counterfactual and wondering what would have happened, we are acknowledging that the fall of a wicket, and in particular an early wicket, would have significantly altered the context for the remainder of the innings. The fall of an eady wicket could have led to a few more wickets or slowed the pace of the innings altogether a fact followers of the Pakistan cricket team are all too familiar with. As observers of the game will attest, context determines much of how a typical innings in cricket evolves. This context includes everything from the fall of wickets to relative strengths of the teams to the significance of the match-up itself, to the pitch and oveihead conditions. Indeed, whenwe declare that a team is ascendant in a game, we are articulating our human interpretation of the context of the game. However, context is not static it evolves as the innings progresses and influences our understanding of the game.

At CricketML, we attempt to understand the complex components of context of an innings in cricket in an automated manner. This allows us to better understand the game, quantify the value of certain passages of play and also make predictions about the future. It is our hope to understand, capture and articulate the typical behaviors in the game. Importantly, this excludes the unpredictable, often freakish, events that make it exhilarating to watch sports. Specifically, we do not attempt to and could not have predicted or make sense of any of Wasim Akram`s famous hat-tricks or Rohit Sharma`s double-centuries.

FORECASTING THE FUTURE Score forecasting is an exercise all fans of the sport engage in. Take the Champions Trophy 2017 final, for instance. At the30 over mark, Pakistanhad made 179 runs for the loss of one wicket. Everyone watching the game tried to estimate the total Pakistan would put on the boani Our wo1k extends this exercise by proposing a score forecastingalgorithm which produces an estimate of the entire remainder of an innings. Mathematically, we show that the evolution of an innings, i.e. the pattern of scoring and loss of wickets, captuæs the entire context of a cricket innings and no other information is needed. Naturally, the more of an innings we get to observe, the better we capture the context. Next, we look across all past cricket innings, in the relevant format of the game, to find those that can combine to serve as a hypothetical version that closely resembles the innings under consideration. This is themotivationbehind an idea known as synthetic control. Once we produce this hypothetical version of the current innings, we can then use it to produce a forecast for the remainder of the innings. For reference, our algorithm would have forecasted a final score of 336. Pakistan ended their innings on 338.

Does this mean that we now have a holy grail that can forecast everything that will happen in the future of a cricket innings? If so, is there a point in watching a game when we could simply stop half way and forecast the remainder? No. We certainly do not claim to have discovered the holy grail. In fact, our estimated forecasts are predicated on the assumption that nothing out of the onlinary will happen during the remainder of the innings. A Shahid Afridi blitz or Yuvraj Singh`s six sixes or an Imran Tahir hat-trick are inherently unpredictable events and comprise the beauty of the sport.

Those are the sort of extraonlinary feats that make cricket, like all sports, such compelling viewing.

Algorithms cannot forecast such events.

In the Champions Trophy 2017 final Pakistan were 294-4 in 45 overs and, by modern batting standards, most of us were expecting Pakistan to finish near 350 runs in 50 overs. Sure enough, our algorithm also updated its forecast to 347 runs.

However, some excellent death-overs bowling by India restricted Pakistan to 338. Eventually, that slowdown did not cost Pakistan, but could Pakistan have come to rue that missed opportunity had Mohammad Amir not induced rare back-to-back mistakes from the bat of Virat Kohli? Had Pakistan failed to defend the total, analysts would have spent a good amount of time focusing on the relative slow-down in the final few overs of the Pakistan innings, a fact our algorithm alluded to well before the start of the Indian innings.

DECLARING WINNERS While score forecasting can project how an innings is likely to progress from any point onwani, deciding which teamis in the ascendency at any given point during the second innings of a limited overs game is, arguably, of much greater interest. Put differently, if no more play is possible, can one declare a winner? The most common examples of such scenarios are games where weather-related interruptions lead to shortened innings. The ICC`s solution is the DLS method which is a statistical answer to `who is in the ascendency` at every point during the second innings. The DLS method was introduced as an attempt to decide outcomes of games in a fair manner. However, the use of the DLS method can often lead to murmurs of dissatisfaction among players and we often hear captains declaring their intent to bat second if rain is forecasted. Should a fair method cause such consternation among cricketers? Through our work, we show that there is indeed a bias introduced by the DLS method in the favor of the chasing team.Further, we establish that the bias is not due to randomness it is statistically significant. During a fifteen-year period between 2003-2017, teams batting second won 51 penznt of the games wheæ DLS was not needed. However, in the same period, 59 pement of the games that weæ decided by the DLS method weæ won by the team batting second a statistically significant bias of over 8 pement over nearly 2000 games.

We approach the taiget resetting pmblem by taking the context in to account. In this case, the context is the typical path (runs and wickets) that the team needs to take to achieve their taiget exactly. We look acioss historical data to æcalibrate hundæds of innings such that they hit the desiæd taiget pæcisely. We thenuse a combination of those innings as the context for a typical successful chase of the taiget scoæ. This piuvidesa æfeænce which can be used to compare the current innings to. At every point in the innings, if the team batting second has made equal or moæ runs for the loss of the same number of wickets as the æfeænce trajectory, then that team is declaæd the winner. Otherwise, the team who batted flist is declaæd the winner We let the context of actual past games determine how taigets aæ typically chased down This is done in an algorithmic manner wheæ the significant ineæase in run-rates over the last few oveis is automatically captuæd thmugh the context that the algorithmis learning. Theæfoæ, unlike the DLS method, which æquiæs fæquentupdates to its parameteis, our algorithm automatically adapts to the changing natuæ of the modern game without any human intervention As a æsult, we notice that our algorithm pmduces ævised taiget scoms whichaæ a little higher than those pmduced by the DLS method, leading to aæductioninbias in favor ofthe chasing team because they would have to make a few moæ runs than what the DLS method would iecommend. We claim that our algorithmpnxluces fairer outcomes compaæd to the DLS method.

For instance, our method would have pmduced a higher taiget for South Africa than that suggested by the DLS method in the famous tie against Sri Lanka which ended up knocking South Africa out of their own Woiid Cup in 2003. At the minimum, it would surely have pævented Maik Boucher fmm ermneously playing a dot ball on the last ball of the over thinking they had won the game! The writer is a lecturer of Machine Learning at the Massachusetts Institute of Technology He tweets @jehangiramjad