Here is a derivation of a Lagrangian-relaxation algorithm for computing approximately optimal mixed strategies for two-player zero-sum matrix games. It illustrates the basic idea of deriving Lagrangian-relaxation algorithms.

The algorithm and the derivation are from [3]. The approximation guarantee is comparable to those for the packing algorithms of Plotkin, Shmoys, and Tardos [2]. The mixed strategies also happen to be sparse. The existence proof for sparse mixed strategies is from [1].

Problem statement.

Fix a payoff matrix $M$ for a two-player zero sum game, so that $M_{ij}$ is the payoff from MIN to MAX if MAX plays pure strategy $i\in [m]$ and MIN plays pure strategy $j\in [n]$ .

A mixed strategy $q$ for MIN is a probability distribution on $[n]$ . The value of $q$ is $\max _{i\in [m]} \sum _{j\in [n]} M_{ij} q_ j$ (the maximum expected payoff that MAX can force if MIN plays a pure strategy randomly from $q$ ).

We say a mixed strategy $q$ is $k$ -sparse if it chooses uniformly from a multiset of $k$ pure strategies. Equivalently, each probability $q_ j$ is an integer multiple of $1/k$ .

The goal is to compute some $k$ -sparse mixed strategy $q$ for MIN whose value is at most $(1+\varepsilon )$ times the minimum possible.

(Although we will not use them in this note, of course symmetrical definitions hold for MAX. A mixed strategy $p$ for MAX is a probability distribution on $[m]$ ; it has value $\min _{j\in [n]} \sum _{i\in [m]} M_{ij} p_ i$ . If $q^*$ denotes an optimal (minimum value) mixed strategy for MAX, then Von Neumann’s min-max theorem states that, for any payoff matrix $M$ , the value of $p^*$ equals the value of $q^*$ .)

Computing an approximately optimal $k$ -sparse mixed strategy

We will derive the following algorithm and performance guarantee:

1.	Initialize $J$ to the empty sequence.
2.	For $t=1,2,\ldots , k$ :
3.	Choose a pure strategy $j_ t = j$ for MIN minimizing $\sum _{i\in [n]} M_{ij} (1+\varepsilon )^{P_ i}$ , where $P_ i = \sum _{s=1}^{t-1} M_{ij_ s}$ is the total payoff so far if MAX played $i$ every time.
4.	Add $j_ t$ to the end of $J$ .
5.	Return $\tilde q$ , where $\tilde q_ j$ equals $\frac{1}{k}$ times the number of times $j$ occurs in $J$ .

Let $m$ be the number of pure strategies available to MAX.

Theorem.

For any $\varepsilon \in (0,1)$ such that $k \ge 2\ln (m)/\, \mu \varepsilon ^2$ , given the instance, $k$ , and $\varepsilon$ , if each payoff $M_{ij}$ is in $[0,1]$ , then the algorithm returns a $k$ -sparse strategy $\tilde q$ of value at most $(1+\varepsilon )\mu ^*$ , where $\mu ^*$ is the minimum value of any mixed strategy for MIN.

To prove the theorem, we apply the method of conditional probabilities to the following rounding scheme. Fix any integer $k$ and let $q$ be any mixed strategy for MIN.

Randomly sample $k$ pure strategies independently from the distribution $q$ on $[n]$ .
Return the mixed strategy $\tilde q$ such that, for each pure strategy $j\in [n]$ , the probability $\tilde q_ j$ equals the number of times $j$ was sampled.

First we analyze the rounding scheme. Let $\mu$ be the value of the mixed strategy $q$ .

Lemma ([1]).

Assume each payoff $M_{ij}$ is in $[0,1]$ . For any $\varepsilon \in (0,1)$ , if $k \ge 2\ln (m)\, /\, \mu \varepsilon ^2$ , then, with positive probability, the rounding scheme returns a $k$ -sparse mixed strategy $\tilde q$ of value at most $(1+\varepsilon )\mu$ .

Click for proof of lemma…

The mixed strategy $\tilde q$ is inevitably $k$ -sparse.

Recall that the value of $\tilde q$ is $\max _{i\in [m]} \sum _{j\in [n]} \tilde q_ j M_{ij}$ .

Fix any pure strategy $i\in [m]$ for MAX. Say that $i$ is “bad” (for MIN) if $\sum _{j\in [n]} \tilde q_ j M_{ij} \gt (1+\varepsilon )\mu$ . We will show that the probability of this event ( $i$ being bad) is less than $1/m$ . This proves the lemma, because it implies that the expected number of bad strategies is less than 1 (so with positive probability there are none).

Let $J$ be the sequence of $k$ pure strategies sampled from $q$ in the randomized-rounding scheme. The value of $\tilde q$ against MAX’s pure strategy $i$ is

\[ \sum _{j\in [n]} \tilde q_ j M_{ij} ~ =~ \sum _{j\in J} {\textstyle \frac{1}{k}} M_{ij}. \]

Multiplying the right-hand side by $k$ , $i$ is bad if and only if

\begin{equation} \label{eqn} \sum _{j\in J} M_{ij} ~ \gt ~ (1+\varepsilon )\mu k. \end{equation}

Think of the left-hand side of (\ref{eqn}) as the total payoff from MIN to MAX in $k$ independent plays of the game, where in each play MAX plays $i$ and MIN plays randomly from $q$ (as each element in $J$ is in fact randomly sampled from $q$ ). Then, $i$ is bad iff MIN’s total payout to MAX is more than $(1+\varepsilon )\mu k$ .

For each of the $k$ plays individually, since MAX plays $i$ and MIN plays $j$ randomly from $q$ , the expected payout is at most $\mu$ . So the total expected payout for all $k$ plays is at most $\mu k$ . Since each payoff $M_{ij}$ is in $[0,1]$ , by a standard Chernoff bound, the probability that this total payout exceeds $(1+\varepsilon )\mu k$ is less than $\exp (-\varepsilon ^2 \mu k / 2)$ . By the assumption on $k$ in the lemma this is at most $1/m$ .

Now we prove the theorem by showing that applying the method of conditional probabilities to the lemma yields the algorithm.

Click for proof of theorem…

The algorithm will emulate the rounding scheme, but instead of choosing each pure strategy $j_ t$ for MIN randomly from $q$ , it will choose $j_ t$ so as to keep the conditional expectation of the number of bad pure strategies $i$ for MAX below 1.

Let $J=(j_1,j_2,\ldots ,j_ k)$ be the random sequence of pure strategies for MIN from the rounding scheme. Let $P_ i(t) = \sum _{s\le t} M_{ij_ s}$ denote the cumulative payoff to MAX in the first $t$ rounds, assuming, in each round $s=1,\ldots ,t$ , MAX and MIN play pure strategies $i$ and $j_ s$ , respectively. A pure strategy $i$ for MAX is bad (when the rounding scheme finishes) if $P_ i(k) \ge (1+\varepsilon )\mu k$ .

In using the Chernoff bound for each $i$ , the analysis of the rounding scheme implicitly bounds the number of bad $i$ ’s by

\[ \Phi ~ =~ \sum _{i\in [m]} \frac{(1+\varepsilon )^{\textstyle P_ i(k)} }{ (1+\varepsilon )^{(1+\varepsilon ) \mu k } } ~ =~ \sum _{i\in [m]} \frac{ \prod _{t=1}^ k (1+\varepsilon )^{\textstyle M_{ij_ t}} }{ (1+\varepsilon )^{(1+\varepsilon ) \mu k } }, \]

and it shows this expectation is at most 1.

(To understand where $\Phi$ comes from requires a good understanding of the proof of the Chernoff bound. Here are the essentials. For each bad $i$ , the term for $i$ in the sum is at least 1, so the number of bad $i$ ’s is at most $\Phi$ . The proof of the Chernoff bound shows that the expectation of the term for $i$ in $\Phi$ is less than $\exp (-\varepsilon ^2 \mu k/2)$ . By the assumption on $k$ in the analysis of the rounding scheme, this is at most $1/m$ , so the expectation of $\Phi$ is less than 1.)

The algorithm should choose each $j_ t$ to keep the conditional expectation of $\Phi$ , given the choices so far, below 1. Fix any $t\in \{ 0,1,\ldots ,k\}$ . Recall that $\textrm{E}[M_{ij_ t}] \le \mu$ for all $i$ . One pessimistic estimator for the conditional expectation of $\Phi$ , given the first $t$ choices, is

\[ \phi _ t ~ =~ \sum _{i\in [m]} \frac{ \prod _{s=1}^ t (1+\varepsilon )^{\textstyle M_{ij_ t}} \times \prod _{s=t+1}^ k (1+\varepsilon \mu ) }{ (1+\varepsilon )^{(1+\varepsilon ) \mu k} } ~ =~ \sum _{i\in [m]} \frac{ (1+\varepsilon )^{P_ i(t)} \times (1+\varepsilon \mu )^{k-t} }{ (1+\varepsilon )^{(1+\varepsilon ) \mu k} }. \]

To reach a successful outcome, it suffices if the algorithm chooses each path to ensure that $\phi _ t$ does not increase, so that $\phi _ k \le \phi _{k-1} \le \cdots \le \phi _0 \lt 1$ .

The algorithm will choose each $j_{t+1}$ to ensure $\phi _{t+1} \le \phi _ t$ . Letting $\omega ^ i_ t$ denote the term for $i$ in $\phi _ t$ (so $\phi _ t = \sum _ i \omega ^ i_ t$ ) if the algorithm chooses some $j$ to be $j_{t+1}$ , then

\begin{equation} \label{eq:1} \phi _{t+1} ~ \le ~ \sum _{i\in [m]} \frac{ 1+\varepsilon M_{ij} }{ 1+\varepsilon \mu } \omega ^ i_ t. \end{equation}

If we choose $j$ randomly as in the sampling scheme, then $\textrm{E}[M_{ij}] \le \mu$ , so (by inspection) the expectation of the right-hand side of (\ref{eq:1}) is at most $\sum _ i \omega ^ i_ t = \phi _ t$ .

So, to ensure $\phi _{t+1}\le \phi _ t$ , it suffices to choose $j$ to minimize the right-hand side of (\ref{eq:1}). Inspecting (\ref{eq:1}) and simplifying, , $j$ should minimize

\[ \sum _{i\in [m]} M_{ij} M_{ij} \omega ^ i_ t. \]

In the definition of $\omega ^ i_ t$ (via $\phi _ t$ ), the factors of $(1+\varepsilon \mu )^{k-t}$ and $(1+\varepsilon )^{(1+\varepsilon )\mu k}$ are independent of $i$ and independent of all choices so far. Factoring these irrelevant terms out, $j$ should minimize

\[ \sum _{i\in [m]} M_{ij} \big (1+\varepsilon \big )^{\textstyle P_ i(t)}. \]

(Recall $P_ i(t) = \sum _{s=1}^ t M_{ij_ s}$ .) Thus, the algorithm chooses each path to minimize the pessimistic estimator. This keeps the pessimistic estimator from increasing, ensuring a successful outcome.

Finally, note that to do this, the algorithm does not need to know $q$ , yet is guaranteed to do as well as the rounding scheme for any $q$ . This proves the lemma.

Running time

The algorithm takes $k$ iterations. Each iteration requires selecting the best response for MIN for a given mixed strategy for MAX.

If the goal is to find an approximately optimal mixed strategy (independent of $k$ ), then one can just take $k=\lceil 2\ln (m)/\mu \varepsilon ^2\rceil$ . If $\mu$ is unknown, then one can simply run the algorithm until the resulting mixed strategy $\tilde q$ for MIN has value at most $1+\varepsilon$ times the mixed strategy $p$ for MAX defined by $p_ i \propto (1+\varepsilon )^{P_ i(t)}$ . (Note that the main loop of the algorithm is independent of $\mu$ .) This will take $O(\ln (m)/\mu \varepsilon ^2)$ iterations.

In many cases of interest, $\mu$ is exponentially small, so the number of iterations can be exponentially large. In this case, Garg and Könemann’s technique of non-uniform increments can be used to modify the algorithm to have $O(m\log (m)/\varepsilon ^2)$ iterations.

Summary of notes on deriving Lagrangian-relaxation algorithms

Background

Bibliography

[1]	R. Lipton and N. E. Young. Simple strategies for large zero-sum games with applications to complexity theory. In Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, pages 734–740, Montréal, Québec, Canada, 23–25 May 1994.
[2]	S. A. Plotkin, D. B. Shmoys, and E. Tardos. Fast approximation algorithms for fractional packing and covering problems. Math. Operations Research, 20(2):257–301, 1995. Preliminary version in FOCS’91.
[3]	N. E. Young. Randomized rounding without solving the linear program. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 170–178, San Francisco, California, 22–24 Jan. 1995.

Notes on algorithms

Lecture notes on algorithms

Sparse strategies for zero-sum matrix games

Computing an approximately optimal $k$ -sparse mixed strategy

Running time

Related

Background

Bibliography

Computing an approximately optimal kk-sparse mixed strategy

Running time

Related

Background

Bibliography

Computing an approximately optimal $k$ -sparse mixed strategy