An analysis of the efficiency of player performance at the 2011 Cricket World Cup

In limited overs cricket, efficiency plays a significant role in team success. Batsmen especially are under pressure to score quickly rather than in large quantities because only 50 overs are available per innings. This paper uses data envelopment analysis (DEA) and stochastic multicriteria acceptability analysis (SMAA) to assess the efficiency with which players at the 2011 Cricket World Cup converted inputs (balls faced or bowled) into performance outputs. The effect that non-discretionary variables like the cricketing resources available to a player have on his efficiency is controlled for, allowing for a fairer assessment across players from different countries.


Introduction
At the end of a major cricketing tournament such as the Cricket World Cup (CWC), cricket analysts and publications often rank the individual performance of players at the tournament in terms of absolute performance measures, typically the number of runs scored by batsmen and the number of wickets taken by bowlers.For example, a list of the top performers at the most recent CWC held in India in 2011 is shown in Table 1.
These measures are undoubtedly valuable both as genuine indices of performance and as easy-to-understand bases for publicly ranking the players (for example, determining "players of the tournament"), but in this paper a different approach is pursued.Limited overs cricket is first and foremost a game of extremely limited resources i.e. the number of balls each team faces is limited to just 300.The efficient use of these resources is critical -success consists not just in amassing runs or taking wickets, but in doing so in as few balls as possible.The analysis of efficiency is a familiar goal for operations research, often conducted using data envelopment analysis (DEA).The aim of this paper is to use DEA to measure how efficient individual cricket players at CWC 2011 were in converting multiple cricketing inputs into multiple outputs, measured relative to other players at the tournament.We also aim to assess what, if any, insights are obtained by this somewhat differerent view of player performance.
In doing so, two methodological issues need to be addressed.The first of these is what to do about the vast differences between cricketing nations, in terms of the resources they devote to cricket (infrastructure and culture).Major cricketing nations1 spend a substantial amount of money and time to adequately prepare their teams for tournaments such as the CWC.Players in these teams are all highly-paid professionals who would usually have had their skills nurtured from an early age.An extensive support structure assists the players with a range of needs: technical coaching, fitness, diet, financial security and administration.In all major nations, cricket attracts a large following and international players are held in high public regard.Players from minor cricketing nations would usually have a very different experience.These players are often not fully professional and must earn incomes outside of cricket.They play together less frequently, and have less frequent access to coaching and other support.In most of the minor nations, cricket is seen as a peripheral sport and does not attract a lot of public interest, further adding to resource constraints.
Most of these factors are beyond the control of individual players.Thus players from major and minor cricketing nations should not be directly compared without some accommodation being made for these differences.Fortunately, DEA has a number of ways to differentiate between discretionary inputs (those under the control of the decision making unit (DMU) -the player) and non-discretionary inputs (those that the DMU has no control over).An approach that is appropriate for this context was adopted.
The second issue is that DEA provides quite limited information about the efficient DMUs.
In the standard approach that we use, a DMU is considered efficient if some weights exist that make the weighted sum of its inputs less than any other DMU which produces the same (or less) output.No information is given about the possible weights though.A variant of stochastic multicriteria acceptability analysis (SMAA) is used to provide additional information which is useful in differentiating between efficient players.
The remainder of the paper is structured as follows.§2 reviews performance evaluation in sports.§3 describes the data obtained for CWC 2011 and our choice of input and output variables for the DEA.§4 describes the DEA and SMAA approaches used.§5 presents and interprets the results of the different models.§6 assesses the contribution of the "efficiency-oriented" point-of-view and concludes the paper.

Performance evaluation in sport
In sports science, the term notational analysis is used to describe the comprehensive analysis of behavioural aspects of sports performance by objectively recording critical game events in a consistent and reliable manner (Hughes & Bartlett 2002).This serves two main purposes: to provide a direct and accurate feedback system for players (who can view summaries of their match statistics and performance, and watch video replays of specific passages of play) and to collect detailed match information for coaches who can then use this to review and assess player performance, and to inform decision-making, strategy, tactics, and corrective coaching.
A performance indicator, as defined by Hughes & Bartlett (2002), is a selection or combination of action variables that aim to define some or all aspects of a performance, and should obviously relate to successful performance or outcomes in order to be useful.Performance indicators are used to assess aspects of individual or team performance and can be used for comparison with opposition players and teams, or in isolation as a measure of the performance of a team or individual alone.Input and output variables are defined in Table 2.Note that because conceding runs is an undesirable output (as bowlers aim to restrict the number of runs they concede when they bowl), it may be treated by DEA as an input (Cook & Zhu 2002).Hence, efficiency can be improved either through a decrease in input or undesirable output levels or through an increase in (desirable) output levels.Note also that we include both batting average (runs per innings) and total runs scored: these outputs capture different aspects of the game (consistency and strike rate respectively).
A non-discretionary input variable to capture the resources available to each team is added.This is achieved in a crude way by using each team's test-status.While other measures are certainly possible, it is argued that the largest gulf in infrastructure, skills development, funding, and sponsorship (i.e.all the variables we are attempting to control for) is between countries which play test cricket and those that do not.This is reflected in Table 3, which contains the one-day ranking of teams according to the ICC rankings at the beginning of the CWC.The largest difference between adjacent teams is the 29 points which separates the lowest-ranked test-playing nation (the West Indies) and the highest ranked team that did not play test cricket (Zimbabwe, which has since been re-awarded test status).The analysis was repeated using this augmented dataset to assess the effect of the non-discretionary input variable.

Methodology
Two methodologies, namely DEA and SMAA-DEA are presented.

Data envelopment analysis
Models using only discretionary inputs and those including non discretionary inputs are considered.

Models using discretionary inputs only
The basic input-oriented DEA model is given by the following linear programming formulation (Banker et al. 1984), namely to minimise θ BCC subject to where x ij and y kj denotes input i and output k used by DMU j .Given a total of J DMUs, DMU 0 denotes the DMU under investigation.DMU 0 converts I inputs into K outputs.The Banker, Charnes and Cooper (BCC) model assesses whether there is any linear combination of DMUs which produces more outputs than DMU 0 while using less inputs.The minimum value of the objective function, θ * , gives a single measure of the efficiency of DMU 0 .If it is equal to one, then DMU 0 is said to be efficient in converting its inputs into outputs relative to the remaining DMUs under analysis; if θ * < 1, then DMU 0 is inefficient.The convexity constraints J j=1 λ j = 1 ensures variable return-toscale (VRS) so that for an inefficient DMU 0 , a convex combination of inputs and outputs of other DMUs can lead to an efficient DMU 0 that lies on the efficient frontier.The above LP problem is solved for each DMU under investigation.

Models including non-discretionary inputs
In the non-discretionary case, each DMU makes use of inputs x ij to produce outputs y kj given non-discretionary inputs z mj .Without loss of generality we assume that an increase in the levels of the non-discretionary variables equates to a more favourable environment.An early model by Banker & Morey (1986) simply added an additional input constraint J j=1 λ j z mj ≤ z m0 to ensure that the level of each non-discretionary variable for the benchmark composite unit (i.e.J j=1 λ j z mj ) is less than or equal to the level of each nondiscretionary variable for the DMU 0 under evaluation (i.e.z m0 ).Those DMUs that have relatively worse environments, in terms of non-discretionary variables, are made better off as the constraint on non-discretionary variables raises the efficiency of DMUs with relatively low levels in non-discretionary inputs.
The convexity constraint on the non-discretionary variables, however, allows individual DMUs on the efficient frontier that have a better environment than DMU 0 to be part of the benchmark composite unit.In other words, the composite unit (as a whole) allocated to DMU 0 will have a composite rank less than or equal to that of DMU 0 , but the benchmark may comprise of individual DMUs with a higher rank than DMU 0 .The problem is that it may not be feasible for DMU 0 to reach the benchmark target levels of input and output given by its benchmark unit.This problem is addressed in the model of Ruggiero (1996) by excluding all DMUs with more favourable environments from benchmark composite units of DMUs with less favourable conditions, giving the following one-stage input-orientated model assuming VRS.The objective then becomes minimise θ R subject to Any DMU with a higher level of any non-discretionary variable than that of DMU 0 is excluded from the model, ensuring that benchmark units for DMU 0 do not contain DMUs that have a more favourable environment than DMU 0 .The results to follow are based on this model (hereafter denoted by R).

SMAA-DEA
Stochastic multicriteria acceptability analysis (SMAA) is a family of inverse-preference models useful in applications where preference information is unknown.They typically operate by providing information to decision makers about the types of preference information that would lead to the selection of a particular alternative as "best".That is, instead of asking "which player is best given a particular set of preferences?",one asks how many different preferences make a particular player the best one?and "what preferences might make this player the preferred one?".SMAA is usually applied in situations where the assessment of information from decision makers is limited.This can occur where it is practically difficult or impossible to explicitly state preference information, where the decision maker is unwilling to expend the time and effort required for assessment, or in the early stages of a decision process where the aim is to narrow down the set of potential alternatives to a smaller shortlist for closer consideration.SMAA variants differ in terms of the preference model used and thus the type of preference information that is imprecisely known.Variants are available for value function ( Carlo simulation and observing the proportion and distinguishing features of those vectors which result in each player obtaining a particular rank r (often the "best" rank, r = 1).
The SMAA-DEA method evaluates a DMU 0 by the real-valued value function defined by the ratio w k y kj of weighted outputs to weighted inputs, where the weight space is defined as Once a random vector w = {w 1 , w 2 , . . ., w (I+K) } has been generated, a complete rank ordering of is trivially obtained.Let the set of weight vectors w that result in DMU j obtaining rank r be denoted by W r j .SMAA-DEA is based on an analysis of these sets of weights using the following descriptive measures: • Acceptability indices: The rank-r acceptability index b r j measures the proportion of all weights that makes DMU j obtain rank r.The most "versatile" alternatives are those with high acceptability indices for the best ranks.Trninić et al. (2008) suggest that due to the developing nature of modern competitive sports, players are increasingly being required to become more versatile within their preferred position, and to be able to perform in more than one position if required.This motivates the use of the acceptability index.The acceptability index is formally defined by b r j = w∈W r i f (w) dw, but in practice, because SMAA is implemented by generating weights randomly using Monte Carlo simulation, the acceptability index b r i is simply the relative proportion of all simulation runs in which DMU i obtains rank r.In the models presented here, the first-rank acceptability index b 1 j and an inverse-proportionally weighted sum of the acceptability indices for the first β ranks i.e.B β j = β r=1 b r j /β are used.Here β is calculated to cover the top 20% of possible ranks (e.g.top 6 out of 30 players).
• Central weight vectors: The central weight vector w c j is defined as the expected center of gravity of the favourable weight space W 1 j .The central weight vector gives a concise description of the "typical" preferences supporting the selection of a particular player DMU j , with the aim of helping decision makers understand how different weights correspond to different choices.The central weight vector is defined by but again in practice this integral would not be evaluated directly.Instead the central weight vector would be computed from the empirical averages of all weight vectors supporting the selection of DMU j as the best player i.e. the i th element of w c j is the average of all weights for attribute w i in W 1 j .Similarly to the application of DEA, two SMAA-DEA models are run.In the first model, the players' nationalities are ignored to directly compare all players.Then a second analysis is run which incorporates the non-discretionary tier variable.In this analysis, the acceptability indices and central weight vectors of players from top-tier nations are calculated with all other players included in the reference set -those from top-tier and second-tier nations.The SMAA measures for players from second-tier nations, on the other hand, are calculated with only other players from second-tier nations included in the reference set.Each SMAA model generates 100 000 random weight vectors, substantially more than the 10 000 recommended in Tervonen and Lahdelma (2007) to achieve stable estimates of acceptability.

Results
Tables 4 and 5 contain results for the ten most efficient batsmen and bowlers at CWC 2011 respectively, where this ranking is based on the weighted acceptability index B β i and the effect of the non-discretionary tier variable is not included i.e. all players are directly compared.While all efficient players have positive first-rank acceptabilities by definition, for some players none of the 100 000 simulation runs returned a positive result.In Tables 4-7 we denote such acceptabilities by ε.The results show a number of interesting features.Firstly, the lists of most efficient players are quite different from the lists of most prolific players: only three of the top-10 scoring batsmen and four of the top-10 wickettaking bowlers appear in the lists of most efficient players.The efficiency analysis provides an alternate point-of-view on player performance and, although it is difficult to draw any firm conclusions on the usefulness of this new information, it is worth noting though that the efficiency lists for the most part contain established leading players.They also tend to favour batsmen who are thought of as particularly destructive or dangerous (e.g.Pollard, O'Brien, Gayle, Sehwag) and bowlers who are relatively economical, usually conceding between 3 and 4.5 runs per over (e.g.Tahir, Price, Afridi).The analysis is particularly emphatic about the efficiency of Pollard: 98% of all possible weight vectors support his selection as the "best" batsman i.e. the one with the largest ratio of outputs to inputs.The second feature is that the DEA efficiencies and the SMAA-DEA acceptability indices are quite different.Tables 4 and 5 include all players that were found to be efficient by the standard BCC model.The acceptability indices for these efficient players can vary greatly, indicating that some efficient players are supported by a range of different preferences while others are supported by only a very small number of weight vectors, possibly just one.This is particularly true of some of the leading run scorers in the tournament: Sangakkara, Dilshan, Tendulkar, and Trott.In contrast, some of the players with the highest acceptability scores are in fact inefficient: Stirling, Pietersen, Gayle, and Jayawardene among the batsmen; Peterson and Benn among the bowlers.This pattern (high acceptability with inefficiency) indicates excellent but dominated performance, with the result that the player in question is often ranked among the top few ranks but never first.This too seems useful additional information.
The third feature is that there is only one batsman and one bowler from the second-tier nations appearing in the lists of most efficient players.As discussed above it is tempting to conclude that players from the second-tier nations have underperformed, but this begs the question of who they have "underperformed" relative to.Tables 6 and 7 show efficiency results after taking into account the non-discretionary tier variable.There are now five batsmen and four bowlers from the minor cricketing nations appearing in the lists.Of course, the number of second-tier players represented in the new lists can only increase, and one would expect that more players would appear because the criterion for them to be efficient has been relaxed.The acceptability indices, however, offer a defensible basis for comparing the performance of top-and second-tier players in a fairer manner than if absolute performances are used.It can be interpreted as indicating the extent to which a player has outperformed the set consisting of their peers.For example, Ervine, with a weighted acceptability score of 31, has outperformed his peers (players from second-tier nations) to roughly the same degree as De Villiers, with a score of 26, has outperformed his peers (all players).It is just that the sets of peers differ.
In terms of evaluating the effect of including the non-discretionary "tier" variable on the DEA efficiency of players in the second-tier, Tables 8 and 9 contain the frequencies with which players in different tiers find themselves in different efficiency groups, when tier is excluded (BCC) or included (R).For both batsmen and bowlers, there is insufficient evidence (at the 10% level) to reject the hypothesis that the efficiency distributions are different, even before accounting for tier.This suggests that the differences between the tiers are perhaps not as large as originally suspected.The distributions though become much more similar if the non-discretionary variable is included in the analysis.
The average efficiency score in each country changes as the non-discretionary variable is added as shown in Table 10.Of course, players from top-tier nations are not affected by the non-discretionary variable as they are evaluated relative to all players regardless.Among the second-tier nations, those that experience the greatest improvements in average efficiency are Kenya (batting), Canada (bowling), and Zimbabwe (batting).Averaged across all second-tier nations, including the non-discretionary variable increases average efficiency by 11% for batsmen and just over 6% for bowlers.We conjecture that the larger improvement accruing to batsmen may be because of restrictions on the amount of bowling each bowler can do (10 overs per game, while batting is unrestricted).This might decrease the variability in performance across bowlers.It is interesting to note that the two teams with the highest unadjusted average batting efficiencies, Sri Lanka and India, were the tournament's runners-up and winners respectively.Conventional wisdom is that one-day cricket on the Indian subcontinent is often dominated by the bat, because of the slow,    Finally, the information contained in the central weight vectors, shown in Table 11 and 12 for those batsmen and bowlers respectively who obtained non-zero first-rank acceptability indices is considered.These can be used to describe the typical preferences that make each player the "best", and thus to profile each player.Among the batsmen, Clarke is preferred if a large importance weight is placed on batting average (runs per innings), with much less weight on the total number of runs scored.-Substantial differences were found between the rank ordering of players by efficiency and by traditional absolute measures of performance.
-The players at the CWC 2011 who were identified as most efficient were middleorder batsmen who score particularly quickly and spin bowlers who were relatively economical.-The most prolific batsmen were identified as efficient but typically had only small acceptabilities i.e. did not score their runs quickly enough to be supported by a substantial proportion of weight vectors.-It is hard to make the argument that the most prolific players were the "best" players of the tournament.This implies the selection of a very specific set of preference weights (usually placing all of the weight on one output variable).-Other efficient players, particularly Kieran Pollard of the West Indies, scored runs quickly and consistently enough to be selected as best by almost any choice of weights, suggesting exceptional performance.-The countries with the two highest average batting efficiencies (Sri Lanka and India) contested the final of the CWC, tentatively suggesting a link between player efficiency and team success.
• The more finely graded SMAA acceptability index can be used to complement the coarser efficiency classification returned by DEA.This complementarity assumes two forms: -The SMAA acceptability indices allow one to gain a richer picture of the performance of efficient players, who cannot be distinguished by standard DEA.-The SMAA central weight vectors can be used to describe the typical preferences that make each efficient player the "best", and thus to provide multivariate profiles for each player (for example, describing a bowler as a "wicket-taker but expensive", "economical by bowling maidens", etc).
In conclusion, the combination of DEA and SMAA appears to be a useful methodology capable of measuring the efficiency of cricket players in the limited-overs format of the game.Simple extensions to the basic models allow one to control for the effect that non-discretionary variables like the status of cricket in a player's home country have on efficiency, allowing for a fair assessment across players from different tiers.The models that we have employed remain fairly simple though, and offer a number of areas for potential improvement: including a wider range of performance measures from the biomechanical, technical, and tactical classes; including more nuanced measures of cricketing status across countries; taking into account the variability in player performance over the course of a tournament or season; and assessing other formats of the game, particularly T20 cricket where resources are even more scarce, to name a few.

Table 1 :
Best performing batsmen and bowlers at CWC 2011, according to traditional criteria: runs scored and wickets taken.

Table 2 :
Input and output variables used to measure player performance at the 2011 Cricket

Table 3 :
ICC ranking of teams playing at the 2011 Cricket World Cup (as at beginning of tournament).

Table 4 :
Batting efficiency results obtained using DEA and SMAA-DEA without considering a player's country.

Table 5 :
Bowling efficiency results obtained using DEA and SMAA-DEA without considering a player's country.

Table 6 :
Batting efficiency results obtained using DEA and SMAA-DEA with a single non- discretionary "tier" input variable.

Table 7 :
Bowling efficiency results obtained using DEA and SMAA-DEA with a single nondiscretionary "tier" input variable.

Table 8 :
Number of batsmen in each tier obtaining different efficiency scores using the two DEA models.The final row tests the hypothesis that the distributions in the two tiers are the same.

Table 9 :
Number of bowlers in each tier obtaining different efficiency scores using the two DEA models.The final row tests the hypothesis that the distributions in the two tiers are the same.flatpitches that are typical played on.The efficiency results seem to reflect that.South Africa, which has the highest average bowling efficiency, performed well in the early stages of the tournament before being unexpectedly knocked out of the tournament by the West Indies in the quarter finals.

Table 10 :
Average efficiency scores in each country under each of the DEA models.Standard errors are indicated in parentheses.
Ervine is preferred if more importance is placed on achieving batting milestones.Stirling and O'Brien have slight dispositions towards boundaries and batting averages respectively.The central weight vector for Pollard is equally distributed across the four output variables -this reflects the earlier observation that almost all weight vectors (precisely, 98%) support the selection of Pollard as "best".Similar profiling can be carried out among the bowlers.Distinctive profiles can be identified for Tahir and Mpofu (wicket-takers), Price, Osinde, and Johnson (economical but with not many wickets), Razzaq and Lee (bowl maidens but can also be expensive), Afridi (destructive wicket-taker).

Table 11 :
Central weight vectors for all batsmen with first-rank acceptability indices greater than ε (according to the non-discretionary model).