For any non-negative super-martingale, the probability that its maximum $\max _ t X_ t$ ever exceeds a given value $c$ is at most $\textrm{E}[X_0]/c$ .

The Markov bound plays a fundamental role in the following sense: many probabilistic proofs, including, for example, the proof of the Chernoff bound, rely ultimately on the Markov bound. This note discusses a bound that plays a role similar to the Markov bound in a particular important scenario: when analyzing the maximum value achieved by a given non-negative super-martingale.

Here’s a simple example. Alice goes to the casino with $1. At the casino, she plays the following game repeatedly: she bets half her current balance on a fair coin flip. (For example, on the first flip, she bets 50 cents, so she wins 50 cents with probability 1/2 and loses 50 cents with probability 1/2.) Will Alice’s winnings ever reach $10 or more? The bound here says this happens with probability at most 1/10.

Click for background material…

Markov bound for super-martingale maxima

Let $X_0,X_1,X_2,\ldots$ be a non-negative super-martingale — a sequence of non-negative random variables that is non-increasing in expectation: $\textrm{E}[X_ t\, |\, X_{t-1}] \le X_{t-1}$ , and $X_ t \ge 0$ for each $t$ . The sequence may be finite or infinite.

Consider the event that $\max _ t X_ t \ge c$ , for some given $c$ . To bound the probability of this event, if we have a bound on the expectation of $\max _ t X_ t$ we can use the Markov bound. For example, in the ideal case, if it happens that $\textrm{E}[\max _ t X_ t]$ is at most $\textrm{E}[X_0]$ , then the Markov bound implies that the event in question happens with probability at most $\textrm{E}[X_0]/c$ . Although $\textrm{E}[\max _ t X_ t]$ can be much larger than $\textrm{E}[X_0]$ , the desired bound in any case:

Lemma (Markov for super-martingale maxima).

Fix any $c\ge 0$ .
(a) $\Pr [\max _ t X_ t \ge c] \le E[X_0]/c$ .
(b) $\Pr [\max _ t X_ t > c] < E[X_0]/c$ .

In short, this bound substitutes for the Markov bound to give us a natural bound on the probability of the event $\max _ t X_ t\ge c$ . Note that in most applications $X_0$ will be a fixed value independent of the outcome.

Proof idea

For our purposes, knowing how to use this bound is more important than knowing how to prove it. Here is the proof just for the sake of completeness.

To get the intuition, consider the following seemingly weaker bound. If $T$ is any stopping time with finite expectation, then by Wald’s equation $\textrm{E}[X_ T]$ is at most $\textrm{E}[X_0]$ , so by Markov $\Pr [X_ T \ge c]$ is at most $\textrm{E}[X_0]/c$ . That is, the desired bound holds for the single value $X_ T$ .

The proof of the lemma uses this argument, with $T$ specifically defined to be the first time such that $X_ T \ge c$ (if any, else $\infty$ ). (In the example, this is analogous to Alice quitting as soon as her winnings reach $ $10$ .) This $T$ is indeed a stopping time, and, crucially, the event $\max _ t X_ t \ge c$ occurs only if $X_ T \ge c$ . So the bound on $\Pr [X_ T \ge c]$ from the previous paragraph implies the result. A technical obstacle is that $T$ might not have finite expectation, but this is easily overcome via a limit argument.

Proof

Click for proof …

Assume without loss of generality that $X_0$ is a fixed value (non-random). (If not, then condition first on any particular value of $X_0$ .)

Part (a). Let $T=\min \{ t \, |\, X_ t \ge c\}$ . The probability in question is $\Pr [T\lt \infty ]$ , which equals $\lim _{n\rightarrow \infty } \Pr [T \le n]$ . To prove (a), we prove that $\forall n. \, \Pr [T\le n] \le X_0/c$

Define $T_ n = \min (n, T)$ , so that $T\le n \, \Leftrightarrow X_{T_ n} \ge c$ . Then $T_ n$ is a stopping time with finite expectation, and, for $t\le T_ n$ , each difference $X_{t}-X_{t-1}$ is non-positive in expectation and uniformly bounded below (by $0 - c$ ), so, by Wald’s, $\textrm{E}[X_{T_ n}]\le X_0$ .

Finally, $\Pr [T\le n] = \Pr [X_{T_ n} \ge c]$ , which by Markov is at most $\textrm{E}[X_{T_ n}]/\, c \le X_0/c$ .

Part (b). Let $T=\min \{ t \, |\, X_ t \gt c\}$ and $T_ n = \min (n,T)$ . The probability in question is $\Pr [T\lt \infty ]$ , which equals $\lim _{n\rightarrow \infty } \Pr [T \le n]$ , which equals

\begin{align*} \lim _{n\rightarrow \infty } \Pr [X_{T_ n} \gt c] & ~ \le ~ \lim _{n\rightarrow \infty } \frac{\textrm{E}[X_{T_ n}]}{\textrm{E}[X_{T_ n}\, |\, X_{T_ n} \gt c ]} \\ & ~ \le ~ \frac{X_0}{\lim _{n\rightarrow \infty } \textrm{E}[X_{T_ n}\, |\, X_{T_ n} \gt c]} \\ & ~ =~ \frac{X_0}{\textrm{E}[X_{T}\, |\, T \lt \infty ]}. \end{align*}

(The first inequality follows from $\textrm{E}[X_{T_ n}]\, \ge \, \Pr [X_{T_ n} \gt c]\, \textrm{E}[X_{T_ n} \, |\, X_{T_ n} \gt c]$ , using the non-negativity of $X_{T_ n}$ . The second follows, using Wald’s as in the proof of part (a), from $\textrm{E}[X_{T_ n}] \le X_0$ . The third follows by calculation using that $\lim _{n\rightarrow \infty }\Pr [T\ge n \, |\, T < \infty ] = 0$ .)

To conclude, note that (using $\Pr [T\lt \infty ] \gt 0$ , for otherwise we are done) that by definition $T\lt \infty \Rightarrow X_ T \gt c$ , so the expectation in the denominator on the right-hand side is a weighted average of values each of which exceeds $c$ , and so must itself exceed $c$ (that is, $\textrm{E}[X_{T}\, |\, T \lt \infty ] = \textrm{E}[X_{T}\, |\, X_ T > c] > c$ ).

Pessimistic estimator

(a) Given the value $X_{t}$ at the current time $t$ , as long as the bad event has not yet happened (that is, as long as $\forall s \lt t. ~ X_ s \lt c$ ), the value $\phi _ t = X_ t/c$ is a pessimistic estimator for the conditional probability of the event $\exists t.~ X_ t \ge c$ . The value is initially $\textrm{E}[X_0]/c$ , it is non-increasing in expectation with each step, and, as long as it remains less than 1, the event in question doesn’t happen.

(b) The same $\phi _ t$ is a pessimistic estimator for the event in part (b): the value is initially $\textrm{E}[X_0]/c$ , it is non-increasing in expectation with each step, and, as long as it remains less than or equal to 1, the event in question doesn’t happen.

Notes on algorithms

Lecture notes on algorithms

Markov bound for super-martingales

Markov bound for super-martingale maxima

Proof idea

Proof

Pessimistic estimator