The statement and proof of a typical Chernoff bound.

Click for background material…

A Chernoff bound, and proof

Lemma (Chernoff bound [1, 2]).

Let $Y=\sum _{t=1}^ T y_ t$ be a sum of independent random variables, where each $y_ t$ is in $[0,1]$ and $T\gt 0$ is fixed. Let $\varepsilon , \mu \gt 0$ with $\varepsilon \le 1$ .

(a) If $\textrm{E}[y_{t}] \le \mu$ for all $t\le T$ then $\Pr [Y \ge (1+\varepsilon )\mu T] \, \lt \, \exp ({-\varepsilon ^2}\mu T/3).$

(b) If $\textrm{E}[y_{t}] \ge \mu$ for all $t\le T$ then $\Pr [Y \le (1-\varepsilon )\mu T] \, \lt \, \exp ({-\varepsilon ^2}\mu T/2).$

Proof.

Here is a proof for part (a).

\begin{equation} \label{eqn} \Pr [ Y \ge (1+\varepsilon )\mu T] ~ =~ \Pr \Big[ (1+\varepsilon )^ Y \ge (1+\varepsilon )^{(1+\varepsilon )\mu T}\Big] ~ \le ~ \frac{\textrm{E}[ (1+\varepsilon )^ Y ]}{(1+\varepsilon )^{(1+\varepsilon )\mu T}}. \end{equation}

(The inequality above follows from the Markov bound.)

Next we bound the above expectation. The independence of the $y_ t$ ’s implies

\[ \textrm{E}\big [ (1+\varepsilon )^ Y \big ] ~ =~ \prod _{t=1}^ T \textrm{E}[ (1+\varepsilon )^{y_ t} ] ~ \le ~ \prod _{t=1}^ T \textrm{E}[ 1+\varepsilon y_ t ] ~ \le ~ \prod _{t=1}^ T 1+\varepsilon \mu ~ \lt ~ e^{\varepsilon \mu T}. \]

(The second step uses the convexity of $(1+\varepsilon )^ z$ as a function of $z$ , specifically, $(1+\varepsilon )^ z \le 1+\varepsilon z$ for $z\in [0,1]$ . The third step uses $\textrm{E}[y_ t] \le \mu$ . The fourth step uses $1+z\lt e^ z$ for $z\ne 0$ .)

The rest is algebra. Combining the above bound with (\ref{eqn}), the probability in question is less than

\[ \frac{e^{\varepsilon \mu T}}{(1+\varepsilon )^{(1+\varepsilon )\mu T}} ~ =~ \Big( \frac{e^{\varepsilon }}{(1+\varepsilon )^{1+\varepsilon }} \Big)^{\mu T} ~ \lt ~ \exp (-\varepsilon ^2 \mu T/3). \]

(The last inequality follows from $e^{\varepsilon }/(1+\varepsilon )^{(1+\varepsilon )} \lt \exp ({-\varepsilon ^2}/3)$ for $\varepsilon \in (0,1]$ .)

The proof for part (b) is similar — change the sign of $\varepsilon$ throughout, and use $e^{\varepsilon }/{(1-\varepsilon )}^{(1-\varepsilon )} \lt \exp ({-\varepsilon ^2}/2)$ .

Exercise.

Prove part (b) of the Chernoff bound.

Remark: the proof needs only first-order (linear) approximations.

The particular inequalities used in the proof above are elegant and convenient, but other inequalities could be used just as well. For example, we could change the base of the exponent in the proof from $1+\varepsilon$ to $\exp (\varepsilon )$ and then push the proof through using inequalities such as $\exp (\varepsilon ) \le 1+\varepsilon + \varepsilon ^2$ . The proof will work with any inequalities in which the first-order terms (for small $\varepsilon$ ) are tight, although the resulting bound may have a smaller constant in the exponent.

Remark: upper bounds when $\varepsilon$ is large.

Sometimes one wants an upper bound on the probability of a large upper deviation: $\Pr [Y \ge \lambda \mu T]$ where $\lambda$ is larger than 2. The proof above for part (a) applies up to the last step, showing an upper bound of $(e/\lambda )^\lambda /e = \exp (-\lambda \ln \lambda + \lambda - 1)$ . For example, if you throw $n$ balls in $n$ bins, taking $\lambda = c\ln (n)/\ln \ln n$ for some $c$ , the event that a given bin has more than $c\ln (n)/\ln \ln n$ balls is less than $n^{-\Theta (c)}$ .

Pessimistic estimators.

When the time comes to apply the method of conditional probabilities to an existence proof that uses the Chernoff bound, you’ll need a pessimistic estimator for the bound.

Click here for pessimistic estimator…

Consider first the bound on $\Pr [ Y \ge (1+\varepsilon )\mu T]$ . Considering the proof of Chernoff carefully, the proof bounds this probability by the expectation of

\[ \phi ~ =~ \frac{(1+\varepsilon )^ Y}{(1+\varepsilon )^{(1+\varepsilon )\mu \, T}} ~ =~ \prod _{t=1}^ T \frac{(1+\varepsilon )^{y_ t}}{(1+\varepsilon )^{(1+\varepsilon )\mu }}. \]

The proof shows the expectation is less than $\exp (-\varepsilon ^2 T/ 3)$ .

We consider that case when, for each $t \le T$ , we need a bound on the conditional probability of the event $Y \ge (1+\varepsilon )\mu$ given $y_1,y_2,\ldots ,y_ t$ . Following the proof, this conditional probability is at most the conditional expectation of $\phi$ .

The rest is just calculation. We calculate the conditional expectation of $\phi$ , given $y_1,y_2,\ldots ,y_ t$ . The first $t$ terms in the product defining $\phi$ are determined, while the rest are still independent of each other and the conditioning. Thus, the conditional expectation of $\phi$ is

\begin{equation} \label{eqn:pess} \textrm{E}[\phi ~ |~ y_1,y_2,\ldots ,y_ t] ~ \le ~ \prod _{s=1}^ t \frac{(1+\varepsilon )^{y_ s}}{(1+\varepsilon )^{(1+\varepsilon )\mu }} \times \prod _{s=t+1}^ T \frac{1+\varepsilon \mu }{(1+\varepsilon )^{(1+\varepsilon )\mu }}. \end{equation}

Let $\psi _ t$ denote the value of the expression on the right-hand side of \eqref{eqn:pess}. Then $\psi _ t$ is one appropriate pessimistic estimator for the conditional probability of the event $Y\ge (1+\varepsilon )\mu$ , given $y_1,\ldots ,y_ t$ .

Many pessimistic estimators are possible. Recall that the proof of the Chernoff bound uses a sequence of inequalities. Essentially, for each term, the proof uses

\[ \textrm{E}[(1+\varepsilon )^{y_ t}] ~ \le ~ \textrm{E}[1+\varepsilon y_ t] ~ \le ~ 1+\varepsilon \mu ~ \le ~ \exp (\varepsilon \mu ) ~ \le ~ \frac{(1+\varepsilon )^{(1+\varepsilon )\mu }}{\exp (\varepsilon ^2\mu /3)}. \]

(The last step on the right is implicit in the last step of the proof.)

Correspondingly, one can modify $\psi _ t$ to get other valid pessimistic estimators:

Each term $(1+\varepsilon )^{y_ s}$ could be replaced by $1+\varepsilon y_ s$ .
Each term $1+\varepsilon \mu$ could be replaced by $\exp (\varepsilon \mu )$ .
Each term $\frac{e^{\varepsilon \mu }}{(1+\varepsilon )^{(1+\varepsilon )\mu }}$ could be replaced by $\exp (-\varepsilon ^2\mu /3)$ .
etc…

In a sense $\psi _ t$ as defined above is the “least pessimistic” (smallest) estimator. In comparison, I think the “most pessimistic” (largest) turns out to be

\[ \prod _{s=1}^ t \frac{1+\varepsilon y_ s}{1+\varepsilon \mu } \times \exp (-\varepsilon ^2\mu T/3). \]

The various modifications described above give pessimistic estimators between $\psi _ t$ and the one above.

Lower tail bound. Next consider the bound on $\Pr [Y \le (1-\varepsilon )\mu T]$ . Given $y_1,y_2,\ldots ,y_ t$ , the conditional probability of the event $Y \le (1-\varepsilon )\mu T$ is at most the pessimistic estimator

\[ \psi _ t ~ =~ \prod _{s=1}^ t \frac{(1-\varepsilon )^{y_ s}}{(1-\varepsilon )^{(1-\varepsilon )\mu }} \times \prod _{s=t+1}^ T \frac{1-\varepsilon \mu }{(1-\varepsilon )^{(1-\varepsilon )\mu }}. \]

As discussed above, many modifications are possible. For example, $(1-\varepsilon )^{y_ s}$ could be replaced by $1-\varepsilon y_ s$ , and so on, to obtain a more convenient form for a given application.

Bounding the increase with each step. Generally, the $t$ th step of an algorithm will determine $y_ t$ . To make sure the algorithm’s action in the $t$ th step keeps the overall pessimistic estimator from increasing, we’ll need a bound on the value of $\phi _ t$ relative to $\phi _{t-1}$ . We will almost always prefer a bound such as the following

\[ \phi _ t~ \le ~ \frac{1+\varepsilon y_ t}{1+\varepsilon \mu } \phi _{t-1}, \]

which is linear in $y_ t$ . Essentially the same bound works for all the variants of $\phi _ t$ described above.

Example (conditioned balls in bins).

Suppose that $T$ balls are thrown into $n$ bins. Let $Y$ be the number of balls landing in the first bin, so $\textrm{E}[Y] = T/n$ . According to the Chernoff bound, the probability that $Y \ge (1+\varepsilon )T/n$ is less than $\exp (-\varepsilon ^2T/2n)$ . Given that $b$ of the first $t$ balls thrown go into the first bin, the conditional probability is less than $\exp (-\varepsilon ^2T/2n) {(1+\varepsilon )^{b}}/{(1+\varepsilon /n)^{t}}.$

Bibliography

[1]	H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Math. Stat., 23:493–509, 1952.
[2]	W. Hoeffding. Probability inequalities for sums of bounded random variables. American Statistical Journal, pages 13–30, Mar. 1963.

Notes on algorithms

Lecture notes on algorithms

Chernoff bound proof

A Chernoff bound, and proof

Related

Bibliography