Archive for April, 2013

Prisoners’ Dilemma Part 3– The Predictions

I expected to find that some of the more complex, devious strategies would do better at the game than their “nicer” counterparts. How could they not? Obviously, strategies focused on seeking opportunities to further their own success, even at the expense of others are, by definition “good competitors”.

I expected to find the nicest players to be among the biggest losers. The more willing a strategy is to play nice, and be forgiving in the wake of an opponents treachery, the more open it leaves itself to exploitation affording its opponents a large payoff and subjecting itself to a heavy penalty. As I mentioned earlier this also seems to fit common western wisdom harmonizing with the “Nice guys finish last” sentiment.

Surprisingly, happily, I was very, very wrong.



As I began to develop strategies to load into the simulation I was surprised by how well they could be characterized according to their “temperament”. To help give me the ability to generalize results as well as talk about them, I used the following structure and vocabulary to describe their behaviors:

“Nice” strategies prefer to cooperate and will never be the first to play “Defect”. After an opponent plays “Defect”, nice strategies retaliate, hold grudges, or do anything else they see fit.

Nice strategies are further classified as:

Naive Nice to the extreme… Play “Cooperate” on every turn, no matter what the other player does.
Vengeful After the first opponent-throw “Defect” , vengeful strategies protect themselves and try to recoup the losses of that betrayal by playing nothing but “Defect” for the remainder of the game.
Forgiving A vengeful strategy that returns playing “nice” if opponent shows some degree of contrition. A very forgiving strategy requires only one “Cooperate” from its opponent to return to playing nice. Less forgiving strategies may require more.

“Naughty” strategies are those that are capable of being the first player to play a “Defect”. These strategies try to take advantage of the large payoff that occurs when they can play a “Defect” against their opponents “Cooperate”

Naughty strategies can be further described as:

Selfish Plays “Defect” on every turn, regardless of what the other player plays.
Devious Plays “Cooperate”, while looking to be the first to play a “Defect” against opponents “Cooperate”. Will likely play “Defect” for the rest of the game to protect itself against being exploited by retaliation from the other player.
Repentant At some point after playing a “Defect”, a repentant strategy will try to coax its opponent into mutual cooperation by playing “Cooperate”. If successful, it may then start to play deviously, or stay “nice” for the remainder of the game.


Non-Zero Sum Game

One of the first things that struck me as interesting that I hadn’t considered at the start is that Prisoner’s Dilemma is not a zero-sum game. A zero-sum game is any competition in which one player or team benefits at the expense of the other player or team. One player is the “winner” and the other player the “loser”. The magnitude of the benefit of winning exactly equals the expense of losing. Most of the games and competitions that we are consciously exposed to are zero-sum games: Football, Soccer, Baseball, even often Business are all considered zero sum in that either one team wins or the other does. It is equally correct to think of a particular team scoring a basket or making a goal as it is to think of the opposing team giving up a basket or goal. The zero-sum method of thinking about a career advancement opportunity is, “I didn’t get the promotion because my (former) peer did.”

Prisoner’s Dilemma is a non-zero sum game in that the benefit due the winner of each game is not at the expense of the loser. Consider, for example, the game where both players play “Cooperate”. Each player is paid $300. One player did not benefit at the expense of the other. Instead, the players, together, gain $600 at the expense of the “banker”… an entity who is not even part of the competition. I found that many of the mistakes in thinking about Prisoner’s Dilemma stem from that fact that it is NOT a zero-sum game. It is no accident that most of the “competitions” we participate in Las Vegas are also not zero-sum games, but rather pay winning, not at the expense of other players, but at the expense of the “house”. So unaccustomed are we to thinking about the world in terms of non-zero-sum games that it is relatively easy to distort the innate feel for risk and rewards our “gut” affords each of us.


Strategy of Opponents

The success of any strategy is highly dependent on the other strategies against which it is competing. This is obvious, but I was surprised by the degree to which this was true. When I correlated the performance of a strategy with both the strategy itself and the other strategies present, I found that the correlation of all the OTHER strategies was much stronger than with the strategy itself. In other words, knowing about all the other strategies against which it will be competing made it easier to predict how well a strategy would perform than knowing anything about the strategy itself!

I saw this over and over again in the simulations. The end result would change dramatically depending on exactly which strategies were chosen to compete. I found plenty of examples of A vs. B vs. C where A was the winner. But when I added D, D would perform very poorly and not be the winner… It might even be the loser. But its affect on the system would still be felt leaving B (for example) as the winner.

To some extent, this is simply to be expected. If one comes up with a new strategy that is simply better at the game than previous strategies it will be the winner. Similarly if a new strategy is dreamt up that is unusually lethal to a particular strategy we’ll also find in competitions that involve this new strategy and its target, that its target will perform unusually poorly, while it will do unusually well.


Prisoners’ Dilemma Part 2– The Simulation

To gain insight into the dynamics of Prisoners’ Dilemma , I wrote a computer program that played the banker.  This main “banker” program accepted input from two independent sub-programs which would act as the two players.  Each player program would signal to the banker a play of either “Cooperate” or “Defect”.  The banker would then return to each player program what the other player program had played as well as the outcome.  The banker would then either signal to the player programs that it was ready again to accept their “Cooperate” or “Defect” symbols, or that the game just played represented the final game in the match.  On that news, the player programs would self-terminate and the banker program would select two new opponents and launch the programs that represented each of them.

In this way I was able to simulate with computer code, all three roles of the game (the two players and the banker) as well as provide the freedom for each player program to adopt any type of strategy I could imagine.  A configuration file read by the banker program allowed me to define the number of games that would be played per match.  In this way I could very quickly explore not only a very large number of strategies, but also how each of these strategies performed against every other strategy.


Choosing Opponents

The last piece I needed to provide a complete computer simulation was a method (and associated computer code) for selecting whom competed against whom.  I considered several different approaches:

A Matrix Method in which every possible strategy in the system played against every other possible strategy exactly X times.

A List Method in which the program would take a list of all the possible strategies in the system , and starting at the top, use the list to decide one of the contestants.  The other would be selected at random.

A Pure Random Method in which both contestants are selected at random.

All of these potential methods suffered from the same flaw:  The relative frequencies of strategies would certainly NOT be equal in nature and on top of that, not be equal about matching against one another, while all the strategies I’d described above would have an equal chance of matching A against B against C.  This could be relatively easily corrected by simply using a weighted method of random selection.  If A is 2x more prevalent than B in a population, then whenever using a random method to select an opponent, I make sure that my method is twice as likely to choose A as B.

But there was no way to know what the initial frequencies should be.  There was no population to go measure and then adjust the frequencies accordingly.  In fact, the strategies I was using, while some were quite complex, none were nearly adequately complex enough to consider the thousands to even millions of factors a biological individual may consider when deciding whether or not to jump up on to your lap.

Worse, these frequencies would certainly change over time.  In fact, in real life, any strategy that was successful would naturally increase in frequency for all kinds of reasons:  Maybe winning strategies are sought out adopted by increasing numbers of players.  Or maybe they’d be passed down generationally from father to son (more successful fathers would have more children).  Or maybe strategies would be passed from one generation to the next at the molecular level via genes.  Whatever the reason, it seems pretty basic to assume that a successful strategy would be employed by larger portion of the players as time goes on.

As an aside, note that this means that a successful strategy, in order to truly be robust, must be unusually good at competing against itself or against other rival successful strategies.  Another way to look at it is that a truly robust strategy will mean that there is no other strategy that a player might switch to which would increase her success.  But how to best model this?


Survival of the Fittest

Biology suggests an elegant answer.  Natural selection, sometimes also referred to as “survival of the fittest” refers to the way that nature shapes species over time.  Any trait (or “strategy”) that improves an individual organism’s success, and thus relative number of healthy offspring, will become more prevalent in the population over time.  Any detrimental “strategy” or trait, by decreasing the success of the organism, will leave it with relatively fewer healthy offspring, and thus the trait becomes less and less prevalent in the population over time.

If I treated each strategy as a living organism and rewarded success and failure, not with money, but with relative reproductive success, I would fundamentally bring natural selection to bear in my model.  Under the pressure of this unnatural natural selection, the weaker strategies would be eliminated while the stronger strategies continue to fight generation after generation for domination.  Essentially, my “soup” of strategies would “evolve” towards the most successful and stable outcome.


Iterations and Generations

I start round one (or the first generation) of the simulation by “combining” equal portions all the strategies being considered.  The computer then matches each strategy against an opponent strategy chosen at random to begin the game.  In this first generation, all strategies have an equal chance of competing against any other strategy, including itself.  The simulated opponents play a series of games after which the computer ends the round and totals the score for each player/strategy.  The number of games played per round should is not important save that it be a relatively large number (to help eliminate spurious results) and that the number of games per round be unknown to the strategies.  I’m not going to discuss the reasons for this here, but would encourage anyone who can see the problems a known number of games creates, to write it up in the comments below.  The reason turns out to be very similar to the reason a single game is not interesting.

At the end of each round (or generation) the computer converts each strategy’s winnings to a relative amount of reproductive success.  In each round past round one, more successful strategies are present in a greater proportion than less successful strategies.  As a result, strategies find themselves with these successful strategies as a rival more often than they would in round one.  By the same reasoning, each strategy will see fewer and fewer relative matches against strategies that have not been so successful.  As the rounds progress, the population is made up of greater percentages of the most successful strategies, with the frequency of less successful strategies thinning, possibly even to the point of extinction.