Here is a derivation of a Lagrangian-relaxation algorithm for computing approximately optimal mixed strategies for two-player zero-sum matrix games. It illustrates the basic idea of deriving Lagrangian-relaxation algorithms.
The algorithm and the derivation are from [3]. The approximation guarantee is comparable to those for the packing algorithms of Plotkin, Shmoys, and Tardos [2]. The mixed strategies also happen to be sparse. The existence proof for sparse mixed strategies is from [1].
Fix a payoff matrix M for a two-player zero sum game, so that M_{ij} is the payoff from MIN to MAX if MAX plays pure strategy i\in [m] and MIN plays pure strategy j\in [n].
A mixed strategy q for MIN is a probability distribution on [n]. The value of q is \max _{i\in [m]} \sum _{j\in [n]} M_{ij} q_ j (the maximum expected payoff that MAX can force if MIN plays a pure strategy randomly from q).
We say a mixed strategy q is k-sparse if it chooses uniformly from a multiset of k pure strategies. Equivalently, each probability q_ j is an integer multiple of 1/k.
The goal is to compute some k-sparse mixed strategy q for MIN whose value is at most (1+\varepsilon ) times the minimum possible.
(Although we will not use them in this note, of course symmetrical definitions hold for MAX. A mixed strategy p for MAX is a probability distribution on [m]; it has value \min _{j\in [n]} \sum _{i\in [m]} M_{ij} p_ i. If q^* denotes an optimal (minimum value) mixed strategy for MAX, then Von Neumann’s min-max theorem states that, for any payoff matrix M, the value of p^* equals the value of q^*.)
Computing an approximately optimal k-sparse mixed strategy
We will derive the following algorithm and performance guarantee:
1. | Initialize J to the empty sequence. |
2. | For t=1,2,\ldots , k: |
3. | Choose a pure strategy j_ t = j for MIN minimizing \sum _{i\in [n]} M_{ij} (1+\varepsilon )^{P_ i}, where P_ i = \sum _{s=1}^{t-1} M_{ij_ s} is the total payoff so far if MAX played i every time. |
4. | Add j_ t to the end of J. |
5. | Return \tilde q, where \tilde q_ j equals \frac{1}{k} times the number of times j occurs in J. |
Let m be the number of pure strategies available to MAX.
For any \varepsilon \in (0,1) such that k \ge 2\ln (m)/\, \mu \varepsilon ^2, given the instance, k, and \varepsilon , if each payoff M_{ij} is in [0,1], then the algorithm returns a k-sparse strategy \tilde q of value at most (1+\varepsilon )\mu ^*, where \mu ^* is the minimum value of any mixed strategy for MIN.
To prove the theorem, we apply the method of conditional probabilities to the following rounding scheme. Fix any integer k and let q be any mixed strategy for MIN.
Randomly sample k pure strategies independently from the distribution q on [n].
Return the mixed strategy \tilde q such that, for each pure strategy j\in [n], the probability \tilde q_ j equals the number of times j was sampled.
First we analyze the rounding scheme. Let \mu be the value of the mixed strategy q.
Assume each payoff M_{ij} is in [0,1]. For any \varepsilon \in (0,1), if k \ge 2\ln (m)\, /\, \mu \varepsilon ^2, then, with positive probability, the rounding scheme returns a k-sparse mixed strategy \tilde q of value at most (1+\varepsilon )\mu .
Now we prove the theorem by showing that applying the method of conditional probabilities to the lemma yields the algorithm.
Running time
The algorithm takes k iterations. Each iteration requires selecting the best response for MIN for a given mixed strategy for MAX.
If the goal is to find an approximately optimal mixed strategy (independent of k), then one can just take k=\lceil 2\ln (m)/\mu \varepsilon ^2\rceil . If \mu is unknown, then one can simply run the algorithm until the resulting mixed strategy \tilde q for MIN has value at most 1+\varepsilon times the mixed strategy p for MAX defined by p_ i \propto (1+\varepsilon )^{P_ i(t)}. (Note that the main loop of the algorithm is independent of \mu .) This will take O(\ln (m)/\mu \varepsilon ^2) iterations.
In many cases of interest, \mu is exponentially small, so the number of iterations can be exponentially large. In this case, Garg and Könemann’s technique of non-uniform increments can be used to modify the algorithm to have O(m\log (m)/\varepsilon ^2) iterations.