The evolutionary spatial prisoner ’ s dilemma on a cycle

In this paper we consider the Evolutionary Spatial Prisoner’s Dilemma (ESPD) in which players are modelled by the vertices of a cycle representing a spatial or organisational structure amongst the players. During each round of the ESPD every pair of adjacent players in the cycle play a classical prisoner’s dilemma against each other, and they update their strategies from one round to the next based on the perceived success achieved by the strategies of neighbouring players during the previous round. In this way players are able to adapt and learn from each other’s strategies as the game progresses without being able to rationalise good strategies. We characterise all steady states of the game as well as the structures of those initial states that lead to the emergence of persistent substates of cooperation over time. We finally determine analytically (i.e. without using simulation) the probability that the game’s states will evolve from a randomly generated initial state towards a steady state which accommodates some form of persistent cooperation. More specifically, we show that there exists a range of game parameter values for which the likelihood of the emergence of persistent cooperation increases to almost certainty as the length of the cycle increases.


Introduction
The Prisoner's Dilemma (PD) may be attributed to a 1950 lecture by Albert W Tucker [10] and is the archetypal example of a two-person zero-sum game in classical game theory.The dilemma is often presented in the form of a parable in which two suspects are arrested by police.Having insufficient evidence for convicting the prisoners on a major charge, the police offer both the same deal.Each prisoner is told that if he defects from the other prisoner by testifying against him, while the latter cooperates with the former by remaining silent, the betrayer goes free and the silent accomplice receives the full sentence for the major charge.If the prisoners both cooperate by remaining silent, they are each sentenced to a short prison term on a minor charge (for which the police do have sufficient evidence to convict).However, if the prisoners both defect from one another (i.e.betray each other), then they share the full (long) prison sentence, each receiving a medium term.The PD is then that each prisoner should decide on a strategy which maximises his own reward (referred to as a pay-off) by minimising his prison sentence.If the strategy of cooperation is denoted by C and that of defection by D, then the reward may be modelled by the pay-off matrix for one of the prisoners, known as the row player, where a > 1 and 0 < b < 1.This matrix has been normalised so that the reward of both players for cooperating with each other (i.e. both remaining silent) is 1, while the reward of a cooperator is 0 if the other prisoner defects.The parameter a is known as the temptation to defect, while the parameter b is often referred to as the "punishment" for mutual defection.If a player expects his opponent to cooperate, then he can gain the largest pay-off a by defecting.On the other hand, if a player expects his opponent to defect, it is best to settle for the "punishment" b, again by defecting.If both players follow this rational line of reasoning, they should both defect, while they could have done better by both cooperating.
In evolutionary game theory, the static games of classical game theory, such as the PD described above, are repeated and players are afforded the possibility of adapting and learning good strategies as a result of achieving high pay-off values as the game progresses, rather than having to decide on a rigid strategy beforehand.The introduction of this dynamic element to the theory of games has its origins in biology and is inspired by evolution, as observed in nature [5,7].One of the chief differences between classical game theory and evolutionary game theory is that in the former the focus is on single players who each aim to determine strategies that maximise their pay-off values, while in the latter the focus is on collective strategies and whether these strategies are able to persist over consecutive rounds of the game [14].Evolutionary game theory has been utilised in various biological contexts to investigate the evolution of altruism and cooperation between species [4,5,7,8,13].
Perhaps one of the simplest games in evolutionary game theory is the so-called Evolutionary Spatial Prisoners' Dilemma (ESPD) [9].In the ESPD the players are modelled by the vertices of a so-called underlying graph G which represents some spatial organisational structure amongst the players, determining which players play against each other.During any round of the ESPD each pair of adjacent players in G play a classical PD against each other according to strategies that are based on the perceived success achieved by neighbouring players during the previous round of the game, thus allowing the players to learn by adapting to successful strategies.The pay-off values received by a player from each of his neighbours according to the pay-off matrix Ψ in (1) are summed together, and this total pay-off value is normalised by dividing the total by the number of neighbours.This playing phase of the round is followed by an updating phase during which each player is afforded the opportunity to select his strategy for the next round which may or may not be the same as that during the current round (a player plays the same strategy against all his neighbours during any particular round of the game).The process of updating strategies occurs according to a dynamic updating rule.Nowak and May [9], for example, considered the ESPD where the updating rule is that a player selects the strategy of his neighbouring player who achieved the largest pay-off value during the playing phase of the current round.The authors used simulation to study the formation and persistence over time of complex patterns of cooperative behaviour of the ESPD on a grid graph.
The objective in this paper is to establish analytically the likelihood that a randomly generated initial state will result in the ESPD terminating in a steady state where the strategy of cooperation is able to persist in some structural form from one round to the next if the players are arranged cyclically (i.e. the underlying graph is a cycle).We are also interested in characterising those structural forms in which such persistent cooperation is able to emerge for the case where the underlying graph is a cycle.In a previous paper [1] we studied the ESPD in the case where the underlying graph is a path.This paper may therefore be viewed as an extension of that work.

The game dynamics
Suppose the underlying graph of the ESPD has order n.Then a state of the game is denoted by means of a binary word S = S 0 S 1 S 2 • • • S n−1 , where S i ∈ {C, D} denotes the PD strategy adopted by the player at vertex i during a specific round of the game, for all i ∈ Z n , where Z n is the set of integer residues modulo n.A cooperation run (defection run, respectively) is a maximal contiguous substate of a game state containing only cooperators (defectors, respectively).We denote a cooperation run of length i ≥ 3 by C i and a defection run of length i ≥ 3 by D i .Consider, as an example, the labelled graph of order 5 in Figure 1(a) as underlying graph for the ESPD.The game state CCCDC = C 3 DC is represented graphically in Figure 1(b), where a solid vertex represents a player choosing to cooperate with all his neighbours (i.e.playing the strategy C against all his neighbours), while an open vertex represents a player choosing to defect from all his neighbours (i.e.playing the strategy D against all his neighbours).We shall use this colour coding throughout the remainder of the paper.
We assume that the strategy updating rule of the ESPD is that each player adopts the strategy of the player in his closed neighbourhood (i.e. also taking himself into consideration) who achieved the largest pay-off value during the previous round, with the convention that a player retains his own strategy in the event of a tie.The progression of the states  .The first stage of this progression is motivated as follows.During the initial round player 0 cooperates with players 1 and 2 who, in turn, both also cooperate with player 0, resulting in a pay-off of (1 + 1)/2 = 1 for player 0. Similarly, player 1 cooperates with players 0, 3 and 4 who, in turn, respectively cooperate with, defect from and cooperate with player 0, resulting in a pay-off of (1 + 0 + 1)/3 = 2 3 .These pay-off values, as well as those of the other three players are shown inside the vertices of the graph corresponding to the initial state in Figure 1(b).Comparing his pay-off value of 1 with those of players 1 and 2 ( 23 and 1 3 , respectively), player 0 retains the strategy of cooperation during round 1.Similarly, player 1 compares his pay-off value of 2 3 with those of players 0, 3 and 4 (1, 4 3 and 1 2 , respectively), and adopts player 3's round 0 strategy during round 1, namely defection.Players 2, 3 and 4 also all defect during round 1.
The next stage of the progression towards the state D 5 may be motivated similarly.

Automorphism classes of game states
Whereas a labelled underlying graph is required in order to encode the state of the game as a binary word, use of an unlabelled underlying graph is preferable in an asymptotic analysis of the evolution of the states of the game, where the labels of players are unimportant and one is rather interested in the emergence of structures or forms of cooperation and defection as the game evolves.To this end, two game states during a particular round of the game are automorphic if there exists a permutation f : Z n → Z n , called an automorphism, with the properties that (i) the vertices i and j are adjacent in G if and only if the vertices f (i) and f (j) are adjacent in G, for all i, j ∈ Z n , and (ii) S 2 for all i ∈ Z n (i.e.f is a relabelling of the vertices of G which preserves both player adjacency in G as well as player strategies in the particular round of the ESPD).An automorphism class of game states is a maximal set of states with the property that any two states in the set are automorphic.The class leader of an automorphism class is the lexicographically smallest member of the class (taking C < D).For example, the two game states in Figure 2 of the ESPD with the underlying graph in Figure 1 The thirteen class leaders of the automorphism classes of game states (as well as the full automorphism classes themselves) are shown as an example in Table 1 for the case where the underlying graph is a 6-cycle.
In general there are automorphism classes of game states of the ESPD on a cycle  The number, Λ c (n), of automorphism classes of game states if the underlying graph is a cycle of order n, for small values of n (Sloane's sequence A000029 [11]).The corresponding number, Λ p (n), of automorphism classes of game states, taken from [1, Table 3.2], for the ESPD on a path of order n is also shown for purposes of comparison.

The state graph of the ESPD
A game state is called a steady state if it remains unchanged as the game progresses from one round to the next.The two trivial steady states C n and D n are clearly present in the ESPD on any connected underlying graph of order n.The game progression in Figure 1(b) terminates in the all-defector steady state D 5 .The progression of the game may be described fully, regardless of the initial game state, by means of a so-called state graph.The state graph of the game is a vertex labelled directed pseudograph in which the vertices are the game states and in which there is an arc (directed edge) from a state S 1 to a state S 2 if the game progresses from the state S 1 to the state S 2 within a single round.The state graph therefore captures the dynamics of the game as it progresses from round to round.A game state S 1 is said to attract another state S 2 if there is a (directed) path from S 2 to S 1 in the state graph.The state graph of the ESPD on the underlying graph in Figure 1(a) is shown in Figure 3.The path representing the progression of game states in Figure 1(b) is circled by means of a dashed curve in the state graph of Figure 3; in this progression the all-defector steady state D 5 attracts both the states C3 DC and C D 4 .In addition to the trivial steady states C 5 and D 5 , the existence of another interesting steady state, DCDCC, may be noticed in Figure 3.In this steady state there are is a pocket of persistent cooperation.

Steady states of game states
In order to study the asymptotic behaviour of the states of the ESPD on a cycle, we require the following basic definitions from graph theory.A directed pseudo-graph is a directed graph in which loops 3 are allowed.A directed pseudo-graph is a directed pseudo-tree if its underlying (undirected) pseudo-graph is connected and contains no cycles of length at least 2. Finally, a directed pseudo-tree T in which every vertex has out-degree 1 and in which there exists a vertex r (called a root 4 of T ) with the property that there is a (directed) path of length at least 1 from every vertex to r is called a rooted pseudo-tree.Note that, in order for a root r of a rooted pseudo-tree to have out-degree 1, there must be a loop from r to itself (but there cannot be a loop from any other vertex to itself).
Theorem 1 If the underlying graph of the ESPD is a cycle of order n, then each component of the state graph is a rooted pseudo-tree in which the root is a steady state of the game and in which the all-cooperator steady state C n forms a component on its own.Moreover, if a + b > 2, then the state graph has exactly two components.
The proof of this result is similar to the corresponding result for the ESPD on a path [1, Theorem 2 and Corollary 1] and is therefore omitted here.The result of Theorem 1 may be corroborated in Figure 4.
Note that the result of the theorem implies that if a + b > 2, then there is only one nontrivial component in the state graph (i.e. one component other than the component containing only the steady state C n ) and that this nontrivial component contains D n as steady state.Therefore all initial states of the game (except for C n ) are attracted to the all-defector steady state D n , making persistent cooperation impossible in the case where a + b > 2, unless all players initially already cooperate.The following result shows that lim n→∞ Π c (n) = 0 if a + b > 2, where Π c (n) denotes the probability of the emergence of persistent cooperation from a randomly generated initial state.We therefore restrict our attention in the remainder of this paper to the more interesting region in the parameter space where In this case there are other steady states in addition to the all-cooperator steady state C n and the all-defector steady state D n , as made more precise in the following result.
Theorem 3 If (3) holds for the ESPD on a cycle, then (a) no cooperation run of length 1 can persist to the next round of the game, (b) no cooperation run of length 2 can persist intact to the next round of the game, (c) a cooperation run of length at least 3 persists intact to the next round of the game if and only if it is flanked by two defection runs, each of length at least 2.
The proof of Theorem 3 is again similar to and, in fact, simpler than the proof of the corresponding result for the ESPD on a path in [1, Lemma 2] because of the absence of exceptions at the endpoints of the paths when dealing with cycles.Using the result of Theorem 3, it is possible to enumerate the components in the state graph of the ESPD on a cycle as follows.
Theorem 4 If (3) holds, then there are components in the state graph of the ESPD on a cycle of order n, where S is the set {x ∈ N | i divides n • gcd(i, x) and x < i}.Each of these components contains a single steady state, and these steady states are C n , D n and all those states in which each cooperation run has length at least 3 and each defection run has length at least 2.
Proof: Let Q i denote the number of states, up to automorphism, comprising i defection runs, starting in a run of cooperators and ending in a run of defectors, that is, steady states containing the partial state where each run has been populated above with the smallest number of cooperators and defectors, respectively, in order to ensure the persistence of cooperators according to Theorem 3(c).
The partial state (4) contains 5i symbols, leaving a total of n − 5i indistinguishable5 symbols to be distributed amongst the 2i distinguishable runs.Since the underlying graph is a cycle, the endpoints in the representation in (4) have been chosen arbitrarily.Therefore, all steady states or their mirror images can be represented in the form (4), except for the all-defector and all-cooperator steady states, and so the total number of steady states is given by Let X be the set of all states of the form (4), let ι be the identity permutation on the sequence of runs of a state s ∈ X , let ρ j be the permutation which modular-shifts each run in (4) 2j positions to the right, and let δ be the operation which reverses the order of the runs in ( 4) such that the first run remains in its original position, followed by runs 2i, 2i − 1, and so on.Then the set {ι, ρ 1 , ρ 2 , . . ., ρ i−1 , δ, δρ 1 , δρ 2 , . . ., δρ i−1 } forms a group G of order 2i under the binary operation of permutation composition.It therefore follows by the well-known Cauchy-Frobenius Lemma6 that the number of equivalence classes into which X is partitioned by G is where |F g | is the number of states in X that remain invariant under g.
The identity operator ι leaves all elements of X invariant.Therefore |F ι | = |X |, which is the number of ways of distributing the remaining n−5i indistinguishable symbols amongst the 2i distinguishable runs in (4), that is7 If the shift ρ j is applied to a state s, fixing at most the first j pairs of runs would determine all the remaining runs.The operation may be seen as modular-shifting the pairs of cooperator-defector runs in blocks of length j.If j divides i, then the first j pairs of runs determine the remaining 2(i − j) runs exactly.Otherwise the number of runs that need to be fixed is determined by d = gcd(i, j).Fixing the first d pairs of runs determines the remaining runs.Therefore (n − 5i)2d/2i symbols need to be distributed among the first 2d runs, which can be done in different ways if nd/i is an integer.However, if nd/i is not an integer, then there are not enough symbols available to complete the pattern of runs into a state that is invariant under ρ j .Therefore, It holds for the permutation δρ j , which reverses the order of the runs and then modularshifts the runs j positions to the left, that runs j + 1 and i + j + 1 map onto themselves, while runs j +2 through i+j are mapped to runs i+j +1 through j.Therefore, distributing k symbols among the runs that do not map to themselves in fact determines the placement of 2k symbols, leaving n − 5i − k symbols to be distributed among runs j + 1 and i + j + 1. Hence there are states that remain invariant under δρ j .Substituting ( 6)-( 8) into (5) yields the required number of steady states.There is exactly one steady state in each component of the state graph of the ESPD on a cycle as a result of Theorem 1, thereby completing the proof.
The result of Theorem 4 is tabulated for 1 ≤ n ≤ 15 in Table 3 and may be verified for n ∈ {5, 6, 7, 8} in Figure 5.    4.2], for the ESPD on a path of order n is also shown for purposes of comparison.

The probability of persistent cooperation
We open this section with a theorem characterising those initial game states which lead to some form of persistent cooperation.The proof of the theorem is similar to the corresponding result for the ESPD on a path [1,Theorem 4] (in fact, both the statement of the result and its proof are simpler in the case of a cycle as underlying graph, because of the absence of exceptions at the endpoints of a game state which are necessary in the case of a path as underlying graph).The result of Theorem 5 may be be verified for 5 ≤ n ≤ 8 in Figure 5.
In the remainder of this section we determine the probability, Π c (n), that some substate of persistent cooperation will emerge from a randomly generated initial state of the ESPD on a cycle of order n.Let b n be the total number of binary words of length n containing none of the forbidden substrings A binary string containing the letters C and D is said to be permissable if it contains none of the four substrings above.Let D 6 be the digraph of order 64 in which each vertex represents one of the sixty four binary strings of length six containing the letters C and D. A vertex representing the string s 1 s 2 s 3 s 4 s 5 s 6 is adjacent to a vertex representing the string s 2 s 3 s 4 s 5 s 6 s 7 in D 6 if and only if s 1 s 2 s 3 s 4 s 5 s 6 s 7 is a permissable string.The graph D 6 is shown in Figure 6.Determining the value of b n for n ≥ 7 is equivalent to counting 9 the number of closed directed walks of length n in D 6 , since every permissable string of length n ≥ 7 has a corresponding closed directed walk in D 6 .Consider, as an example, the closed walk associated with the 8 A similar approach as the one used in [1] to derive the probability, Πp(n), that some substate of persistent cooperation will emerge from a randomly generated initial state of the ESPD on a path of order n was attempted, but yielded an unwieldy large set of interdependent variables.The transfer matrix method, however, proved to be a much simpler and a more direct approach when the underlying graph is a cycle. 9If A is the adjacency matrix of a digraph D, then the entry in row i and column j of A n contains the number of (directed) walks of length n from vertex vi to vertex vj in D (see, for example, [12, Theorem 4.7.1]).string DC D 3 CDD in Figure 6.Let A be the adjacency matrix of the digraph D 6 .Then and so it follows by [12,Theorem 4.7.3] that The seed values for (10)   We therefore have the following result.
Theorem 6 If (3) holds, then the probability that a randomly generated initial state of the ESPD on a cycle of order n ≥ 8 will lead to persistent cooperation is given by Π c (n) = 1 − b n /2 n , where b n satisfies the recurrence relation (10) with seed values as in Table 4.
A plot of the values of Π c (n) against n may be found in Figure 7 for 1 ≤ n ≤ 22.The figure also contains the corresponding probabilities for the ESPD on a path, as determined in [1, Theorem 5].It is interesting that Π c (n) ≤ Π p (n) for all n ∈ N, but that both these values are increasing functions of n ≥ 7. The limiting behaviour of these functions is captured in the following result, the proof of which is similar to the corresponding proof in [1,Theorem 6] for the case of a path as underlying graph, since the recurrence relations for a path or cycle as underlying graphs are the same (although the seed values differ).

Conclusion
In this paper we presented an asymptotic analysis of the ESPD on a cycle.We showed that interesting structures of persistent cooperation are possible if the sum of the temptationto-defect parameter and the punishment parameter, a+b, in (1) is not too large.Moreover, we showed that this sum determines the asymptotic behaviour of the game in the sense  that there is a bifurcation point at a + b = 2; below this bifurcation point interesting patterns of persistent cooperation are able to emerge, but not above the bifurcation point, as illustrated in Figure 8.We characterised the steady states of the game as the trivial all-cooperation and all-defection states only for the case where a + b > 2, or essentially all states containing cooperation runs of length at least 3 and defection runs of length at least 2 for the case where a + b ≤ 2. We also characterised those initial states of the game that lead to steady states containing some form of persistent cooperation when a + b ≤ 2. Finally, we computed the probability that persistent cooperation will emerge from a randomly generated initial state, showing that the likelihood of such persistent cooperation increases towards certainty as the order of the cycle grows in the case where a + b ≤ 2.

Figure 1 :
Figure 1: (a) An example of an underlying graph for the ESPD.(b) An evolution of states of the ESPD on the underlying graph in (a) for the parameter values a = 4 3 and b = 1 3 in (1), where a solid vertex represents a cooperator, while an open vertex represents a defector.

Figure 2 :
Figure 2: Two automorphic states of the ESPD on the underlying graph in Figure 1(a), where a solid vertex represents a cooperator, while an open vertex represents a defector.

Table 1 :
(a) form an automorphism class of the game, with the function f * as automorphism from the state represented in Figure 2(a) to the state represented in Figure 2(b), where f * (0) = 2, f * (1) = 3, f * (2) = 0, f * (3) = 1 and f * (4) = 4.The state in Figure 2(a) is the class leader of this automorphism class, since these two states are encoded as C 3 DC and CD C 3 , respectively, of which the first is lexicographically smaller.The thirteen automorphism class leaders of game states (as well as their corresponding full automorphism classes) for the case where the underlying graph is a 6-cycle.The right-most cell of each linear array wraps around to form a cycle with the left-most cell.A solid cell represents a cooperator, while an open cell represents a defector.

Figure 3 : 3 and b = 1 3
Figure 3: The state graph for the ESPD on the underlying graph in Figure 1(a), with a = 4 3

(a) n = 5 (b) n = 6 (c) n = 7 (d) n = 8 Figure 4 :
Figure 4: The state graph for the ESPD on a cycle of order n ∈ {5, 6, 7, 8} for the case where a + b > 2. The rectangular arrays of cells in the figure should be interpreted as wrapping arrays in the sense that the right-most cells wrap around so as to be adjacent to the left-most cells.A solid cell denotes a player who cooperates while an open cell denotes a player who defects.

Theorem 5
If (3) holds, then any state of the ESPD on a cycle containing at least one of the substates C 5 , DD C 3 DD or DD C 4 D is not in the component of the state graph which contains the all-defector steady state D n .

C 5 ,
DD C 3 DD, DD C 4 D or D C 4 DD mentioned in Theorem 5.The value of b n may be determined by means of the well-known transfer matrix method 8 .

Figure 6 :
Figure 6: The digraph D 6 used to compute the number of binary words of length 6 containing the letters C and D which do not contain any of the substrings mentioned in Theorem 5.
) is the generating function for the sequence (b n ) ∞ n=1 .It therefore follows by [12, Theorem 4.4.1] that b n may be computed recursively from the recurrence relation b

Figure 7 :
Figure 7: The probability that a randomly generated initial state of the ESPD on a path or cycle of order 1 ≤ n ≤ 22 leads to persistent cooperation.

Figure 8 :
Figure 8: The parameter space of the ESPD on a cycle.
1of order n, where φ(•) is the Euler totient 2 .These numbers of automorphism classes are shown in Table2for small values of n.

Table 3 :
[1, number, Ξ c (n), of components in the state graph of the ESPD on a cycle of order n, for small values of n, if (3) holds.The corresponding number, Ξ p (n), of components, taken from[1, Table are the coefficients of the first seven terms in the McClaurin expansion x + 3x 2 + 7x 3 + 15x 4 + 26x 5 + 45x 6 + 99x 7 + O(x 8 ) of the generating function in (9), as listed in Table 4.