0 Replies - 3994 Views - Last Post: 31 December 2015 - 12:15 AM Rate Topic: -----

#1 macosxnerd101  Icon User is offline

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12177
  • View blog
  • Posts: 45,245
  • Joined: 27-December 08

Game Theory: Mixed-Strategies and Zero-Sum Games

Posted 31 December 2015 - 12:15 AM

I. Introduction
In my previous tutorial, the normal form game and Nash equilibrium were introduced. Attention was restricted to pure strategies. As will be illustrated below, not every finite, normal form game has a pure strategies Nash equilibrium. The notion of mixed strategies extends the notion of pure strategies, allowing players to assign probabilities to each pure strategy. This extension provides for the existence of a mixed strategies Nash equilibrium in every finite, normal form game. Additionally, zero sum games and prudent strategies will be discussed.

I have included a LaTeX typset version of this tutorial.
Attached File  Game_Theory_II.pdf (206.82K)
Number of downloads: 189

II. Mixed-Strategies
In this section, we introduce the notion of mixed-strategies. One important motivator for mixed-strategies is that not every game has a pure strategies Nash equilibrium. Consider the matching pennies game:

Posted Image

In any pure strategy profile, one player incurs utility $1$ and the other player incurs utility -1. The player incurring utility -1 can unilaterally deviate by switching its choice to improve its utility. This inverts the payoffs- the first player incurs utility -1 while the second player incurs utility 1. Iterating on the above argument, we see that no Nash equilibrium exists in pure strategies.

Mixed Strategies: Let Γ be a normal form game. Let i \in N. A mixed strategy is a sequence (s1, ..., sk) \in Si and a probability distribution σ = (σ1, ..., σk) where player i selects strategy sj with probability σj. Note that σ1 + ... + σk = 1. The set of mixed strategies for player i is denoted Σi := Δ(Si), where Δ(Si) is the simplex in R|Si|. That is: Δ(Si) = { x \in |Si| : x_{i} \geq 0 for all i \in {1, ..., |S_{i}|}, x1 + ... + xSi = 1 }.

Note that pure strategies are a special case of mixed strategies. The mixed extension will now be defined, to formalize the notion of games with mixed strategies. A mixed strategies Nash equilibrium in a normal form game is equivalent to a pure strategies Nash equilibrium in a mixed extension.


Mixed Extension: Let Γ = [N, (Si)i \in N, (ui)i \in N] be a normal form game. The mixed extension of Γ is the three-tuple [N, (Σi)i \in N, (ui)i \in N], where Σi := Δ(Si).

The notion of mixed strategies is rather unintuitive from a behavioral perspective, as a normal form game is played simultaneously. So how is a mixed strategies Nash equilibrium formulated? Recall that each player is a rational, utility maximizing agent that is aware of the structure of the game. Each player still seeks to mix its strategies in such a way to maximize its utility. In mixing strategies, a player runs the risk that another player can take advantage of a given mixing. Thus, in a Nash equilibrium, each player i mixes strategies such that -i is indifferent to whichever pure strategy ends up being played. That is, -i's expected utility for each of i's pure strategies in the mixing is the same. This is formalized as follows.


Theorem 2.1: Let Γ be a normal form game. A mixed strategy profile σ* is a mixed strategy Nash equilibrium if and only if, for each player i, the following two conditions are satisfied:
  • Every pure strategy si \in Si which is given positive probability by σi* yields the same expected payoff against σ-i*; that is, ui(si, σ-i*) = ui*).

  • Every pure strategy si \in Si which is given probability zero by σi* yields no more than the pure strategies that are assigned positive probability: ui(si, σ-i*) <= ui*).


Proof: Suppose first that the mixed-strategy profile σ* satisfies conditions (1) and (2). Let i \in N. If i unilaterally deviates by shifting positive probability to a strategy si \in Si given zero probability in σ*, then i's utility does not increase by condition (2). Let Si' = { si \in Si : σi*(si) > 0 } be the set of strategies given positive probability by \sigmai*. By condition (1), each pure strategy in Si' results in the same expected utility \overline{u}. Thus, any mixing γ of strategies in Si' results in expected utility:

Attached Image

Thus, player i cannot unilaterally deviate and improve its outcome, so σ* is a mixed strategies Nash equilibrium.

Conversely, suppose σ* is a mixed-strategies Nash equilibrium. As no player can unilaterally deviate and improve its outcome, condition (2) follows immediately. Suppose to the contrary that condition (1) does not hold. Let i \in N and si \in Si such that the ui(si, σ-i*) != ui*). If ui(si, σ-i*) > ui*), then player i could assign more weight to si in σi* and improve its outcome. Similarly, if ui(si, σ-i*) < ui*), then player i could assign less weight to si in σi* and improve its outcome. Either occurrence contradicts the assumption that σ* is a mixed-strategies Nash equilibrium. QED.


Example 1: Let's now use Theorem 2.1 to find a mixed-strategies Nash equilibrium for the Matching Pennies game. Player 1 mixes strategies such that Player 2 is indifferent to H and T. Suppose Player 1 plays H with probability p and T with probability 1-p. Player 2's payoff from playing H is -p + (1-p) = 1 - 2p. Player 2's payoff from playing T is p - (1 - p) = 1-2p. In equilibrium, Player 2 is indifferent between playing H and T. Setting 1 - 2p = 2p - 1 => p* = 1/2. By symmetry, we have Player 2 mixing between H and T with frequencies (1/2, 1/2) as well.

In addition to guaranteeing the existence of a Nash equilibrium, mixed strategies are also useful in selecting realistic Nash equilibria. Consider the following example.

Example 2: Consider a traffic routing game on the following network. The weight of each edge denotes the latency cost of traversing that edge. The variable x denotes the number of players traversing the edge (A, B), and the variable y denotes the number of players using the edge (C, D). So for example, if x = 50, then the latency cost of (A, B) is 1.5 for every each of the 50 players. Each player starts at A and ends at D, seeking to minimize latency.

Posted Image

Suppose there are 100 players in the game. Denote n1 as the number of players choosing the path ABD, n2 as the number of players choosing the path ACD, and n3 as the number of players choosing ABCD. Consider first the pure strategies Nash equilibrium of n1 = 25, n2 = 25, n3 = 50. Both the edges (A, B) and (C, D) have 75 players traversing them, and so have latency costs 1.75. Players of each type incur latency cost 3.75. If a player of type n1 unilaterally deviates, he increases the latency cost of the edge (C, D) to 1.76, resulting in a total latency cost of 3.76. By similar argument, players of type n2 and n3 cannot unilaterally deviate and decrease their costs as well.

While n1 = n2 = 25 and n3 = 50 is a pure strategies Nash equilibrium, it is unlikely the 100 players will end up playing this strategy profile. However, this pure strategies equilibrium does provide the probabilities for a mixed strategies equilibrium. As the game is symmetric, there exists a Nash equilibrium in which each player selects the same strategy. Suppose each player selects the mixed strategy (ABD, ACD, ABCD) with probabilities (1/4, 1/4, 1/2). We apply Theorem 2.1 to verify this mixed strategy profile, denoted σ*, is a mixed-strategies Nash equilibrium.

First, observe that E[ui*)] = 3.75. Consider each of the pure strategies ABD, ACD, ABCD.
  • Suppose player i plays the pure strategy ABD. Under σ-i*, n1 = 24, n2 = 25, and n3 = 50. So E[ui(ABD, σ-i*] = 1.75 + 2 = 3.75 = E[u_{i}(σ*)].

  • Suppose player i plays the pure strategy ACD. Under σσ-i*, n1 = 25, n2 = 24, and n3 = 50. So E[ui(ACD, σ-i*] = 1.75 + 2 = 3.75 = E[ui*)].

  • Suppose player i plays the pure strategy ABCD. Under σ-i*, n1 = n2 = 25 and n3 = 49. So E[ui(ABCD, σ-i*] = 1.75 + 0.25 + 1.75 = 3.75 = E[ui*)].


Thus, σ* is a mixed-strategies Nash equilibrium.



III. Zero-Sum Games
In this section, we examine zero-sum games. Intuitively, in a zero sum game, each player's gain (or loss) is exactly balanced with those of the other players. For example, cutting a larger slice of cake for one person leaves less cake for the others. This notion is formalized as follows:

Zero-Sum Game: Let Γ be a normal form game. Γ is said to be a Zero-Sum Game if $\sum_{i \in N} u_{i}(\sigma) = 0$ for every strategy profile σ.

Recall the Matching Pennies game from the previous section. In any strategy profile, one player earns utility 1 while the other player earns utility -1. Thus, the Matching Pennies game is a zero-sum game. Another example of a zero-sum game is Rock-Paper-Scissors. The winner of the game earns utility 1 while the loser earns utility -1. In the event of a tie, each player earns utility 0.

So how are zero-sum games solved? We can solve zero-sum games in the same manner as any other normal form game. Of particular interest, however, are prudent strategies. Intuitively, players who play prudently seek to minimize potential losses. We define prudent strategies as follows:

Prudent Strategies: Let Γ be a normal form game. A prudent strategy of player i is a mixed strategy σi* that satisfies maxσi minσ-i uii, σ-i).

Note that the definition of prudent strategies did not restrict to zero-sum games. We discuss them in the context of zero-sum games; however, as they are most useful here. In the case of a two-player game, saddle points are equivalent to prudent strategies. We define a saddle point as follows.

Saddle Point: Let Γ be a two-player zero-sum game. A saddle point is a strategy profile (σ1*, σ2*) \in S1 x S2 satisfying ui1, σ2*) <= ui1*, σ2*) <= ui1*, σ2) for all σ1 \in S1, each σ2 \in S2, and each i \in { 1, 2 }.

In order to test for saddle points in finite games, we convert the payoff matrix into a mathematical matrix from linear algebra. As we are considering zero sum games, define the matrix A to be a real-valued |S1| x |S2| matrix where [Aij] = u1(si, sj), si \in S1, sj \in S2. That is, [Aij] represents the amount Player 1 wins and Player 2 loses when the pure strategy profile (si, sj) is played. A saddle point Aij is a minimum of row i and a maximum of column j. Let's consider an example of finding a saddle point.


Example 3: Consider the two-player, zero-sum game given by the following payoff matrix.

Posted Image

We begin by converting the payoff matrix to an algebraic matrix:
------------
4 | 1 | -3 |
3 | 2 |  5 |
0 | 1 |  6 |
------------



The row minima are -3, 2, and 0; and the column maxima are 4, 2, and 6. Observe that row two and column two have the same value: 2. So (M, M) is a saddle point with payoff (2, -2).


Example 4: Consider the two-player, zero-sum game given by the following payoff matrix.

Posted Image

We begin by converting the payoff matrix to an algebraic matrix:

----------
 2  | -1 |
-1  |  1 |
----------



The row minima are -1 for row one, and -1 for column one. The column maxima are 2 for column one, and 1 for column two. So there are no pure strategy saddle points for this game.

We solve this game by determining the mixed strategies Nash equilibria. Suppose Player 2 plays L with probability p and R with probability (1-p). If Player 1 plays T, his expected payoff is 2p - (1-p) = 3p - 1. If Player 1 plays B, his expected payoff is -p + (1-p) = 1 - 2p. Setting 3p - 1 = 1 - 2p => p = 2/5. So Player 2 plays L with probability 2/5 and R with probability 3/5 in equilibrium.

We now solve for Player 1's mixed equilibrium strategies. Suppose Player 1 plays T with probability q and B with probability 1-q. Then Player 2's expected payoff from playing L is -2q + 1-q = 1 - 3q. Player 2's expected payoff from playing R is q - (1-q) = 2q - 1. Setting 2q - 1 = 1 - 3q yields q = 2/5. So Player 1 plays T with probability 2/5 and B with probability 3/5 in equilibrium.


When considering mixed strategies or infinite games with compact (closed and bounded) strategy sets (such as mixed extensions of normal form games), saddle points are guaranteed to exist. This is due to a result in real analysis known as the Weierstrass Extreme Value Theorem, which states that a continuous function over a compact set achieves both a maximum and a minimum.

It is easy to verify that a pure-strategy saddle point is a Nash equilibrium in a finite zero-sum games. This verification is presented in the following theorem to build intuition. The logic is analogous in the case of infinite games (such as mixed extensions of finite zero-sum games).


Theorem 3.1: Let Γ be a two-player zero-sum game and let A be the associated matrix. Suppose (si, sj) \in S1 x S2 is a saddle point. Then (s1, s2) constitutes a Nash equilibrium of Γ.

Proof: As Aij is the maximum of column j, a unilateral deviation from Player 1 will result in payoff Akj for some k. It follows that Akj <= Aij, so Player 1 cannot unilaterally deviate and improve its outcome. For Player 2, u2(si, sj) = -Aij as Γ is a zero-sum game and by construction of the matrix A. So if Player 2 unilaterally deviates, the payoff will be u2(si, sm) = -Aim <= u2(si, sj) since Aim >= Aij. And so Player 2 cannot unilaterally deviate and improve its outcome. Thus, (si, sj) is a Nash equilibrium. QED.


The Nash equilibria of zero-sum games can be characterized in terms of prudent strategies. Intuitively, each player seeks to minimize its opponent's payoff. As a zero-sum game is being considered, this leaves more for the individual. This notion is formalized as follows:


Theorem 3.2: In any finite, two-person zero-sum game, the following conditions hold:
  • If (σ1*, σ2*) is a mixed strategies Nash equilibrium, then σi* is a prudent strategy of player i \in {1, 2} and:

    maxσ1 minσ2 u1i, σ2) = minσ2 maxσ1 u11, σ2) (2)

  • If σi* is prudent for each i \in { 1, 2 }, then (σ1*, σ2*) is a mixed-strategies Nash equilibrium.


Proof: Suppose first that (σ1*, σ2*) is a mixed strategies Nash equilibrium. Now suppose to the contrary that for some i \in { 1, 2 }, σi* is not prudent. It follows that there exists a strategy σi' guaranteeing a better result than σi*. Consider the mixed strategy profile (σi', σ-i'), where σ-i' is a best response to σi'. We thus have uii', σ-i') > uii*, σ-i*). As σi* is a best response to σ-i*, we have uii*, σ-i*) >= uii', σ-i*). As σ-i' is a best response to σi' and the game is zero-sum, it follows that uii', σ-i*) >= uii', σ-i'). Chaining the inequalities together implies uii', σ-i') > uii', σ-i), a contradiction. It follows that σi* is prudent for both players.

It will now be shown that (2) holds. Note that for any function f(x, y) and fixed x, y, we have miny' f(x, y') <= f(x, y) <= maxx' f(x', y). Taking the max of both sides maintains this inequality. Thus:


maxσ1 minσ2 u1i, σ2) <= minσ2 maxσ1 u11, σ2).


Similarly, as (σ1*, σ2*) is a saddle point, we have:


maxσ1 u11, σ2*) <= u11*, σ2*) <= minσ2 u11*, σ2)


And so:


minσ2 maxσ1 u11, σ2) <=
maxσ1 u11, σ2*) <=
minσ2 u11*, σ2) <=
maxσ1 minσ2 u11, σ2)



Conversely, suppose σ2* is prudent for each i \in { 1, 2 }. As σi* is prudent, it solves maxσi uii, σ-i*). So player i cannot unilaterally deviate and improve its payoff. Thus, (σ1*, σ2*) is a mixed-strategies Nash equilibrium. QED.

Is This A Good Question/Topic? 1
  • +

Page 1 of 1