A brief introduction to Chernoff bounds.

If you’re already familiar with Chernoff bounds, you may prefer to skip directly to the statement and proof of a typical Chernoff bound.

Chernoff bounds (a.k.a. tail bounds, Hoeffding/Azuma/Talagrand inequalities, the method of bounded differences, etc. [1, 2]) are used to bound the probability that some function (typically a sum) of many “small” random variables falls in the tail of its distribution (far from its expectation).

Click for background material…

Markov bound

The distribution of a sum of “small” random variables

$\includegraphics[type=pdf,ext=.pdf,read=.pdf,width=3.75in]{shared/graphics/chernoff}$

Let $Y$ be a sum of many small random variables. For example, imagine that $Y$ is the number of heads in $n$ coin flips. The figure shows the typical distribution of such a sum around its expectation $\mu$ . For $\varepsilon \in (0,1)$ , the sum has probability about $e^{-\varepsilon ^2\mu }$ of deviating from $\mu$ by more than a factor of $1+\varepsilon$ (that is, an additive deviation of $\varepsilon \mu$ ).

For example (taking any $\varepsilon =\Theta (1/\sqrt \mu )$ ) the sum has constant probability of deviating from $\mu$ by an additive $\Theta (\sqrt \mu )$ .

Or (taking $\varepsilon =\Theta (\sqrt {\log (n)/\mu })$ ), the sum has probability $n^{-\Theta (1)}$ of deviating from $\mu$ by more than $\Theta (\sqrt { \mu \log n})$ .

A typical Chernoff bound quantifies this behavior. It says that the event that $Y$ exceeds $1+\epsilon$ times its expectation $\mu$ is rare: it happens with probability at most $e^{-\mu \epsilon ^2/2}$ . (A similar bound holds for $\Pr [ Y \le (1-\epsilon )\mu ]$ .)

Example applications

Here are a couple of examples using a Chernoff bound.

Example (balls in bins / max bin).

Throw $6 n\ln n$ balls randomly into $n$ bins. Let $Y$ be the number of balls thrown into a given bin. Then $Y$ is a sum of $6 n\ln n$ independent random $\{ 0,1\}$ -variables:

\[ Y=\sum _{t} [\text {ball } t \text { goes in the first bin}]. \]

The expectation of $Y$ is $6\ln n$ . By the Chernoff bound, for any $\varepsilon \in (0,1)$ ,

\[ \renewcommand{\mathspread }{\, \, \, }\Pr [Y \ge (1+\varepsilon )\mu ] {\, \, \, {<}\, \, \, }\exp (- \varepsilon ^2\textrm{E}[Y]/3 ){\, \, \, {=}\, \, \, }1/n^{2\varepsilon ^2}. \]

Taking $\varepsilon = 1$ gives $\Pr [Y \ge 12\ln n] {\, \, \, {<}\, \, \, }1/n^2$ — the probability that a given bin receives at least $12\ln n$ balls is less than $1/n^2$ .

The naive union bound implies that the probability that any of the $n$ bins has more than $12\ln n$ balls is at most $1/n$ .

Exercise (more balls in bins).

Show that if $n^2$ balls are thrown, then the probability that any bin has more than $n+O(\sqrt {n\ln n})$ balls is at most $1/n$ .

Expectations of exponentials

The proof of the Chernoff bound for a sum $Y$ works by reasoning about the expectation of $(1+\varepsilon )^{Y}$ . Here is some intuition for this approach. (For the full proof, go here.)

Recall the humble Markov bound: for any non-negative random variable $X$ , the probability that $X$ exceeds $c$ is at most $E[X]/c$ . This is useful but somewhat weak. Can we strengthen it by transforming $X$ ?

To take a concrete example, let $X_ t$ be the number of heads in $t$ random coin flips. Consider $\textrm{E}[3^{X_ t}]$ . By direct calculation,

\[ \renewcommand{\mathspread }{\, \, \, }\textrm{E}\big [3^{X_{t+1}} \big ] {\, \, \, {=}\, \, \, }{\textstyle \frac{1}{2}} \textrm{E}[3^{X_ t}] + {\textstyle \frac{1}{2}} 3 \cdot \textrm{E}[3^{X_ t}] {\, \, \, {=}\, \, \, }2\cdot \textrm{E}[3^{X_ t}]. \]

By induction, $\textrm{E}[3^{X_ t}] = 2^ t$ .

Does this give anything useful? Applying the Markov bound to $3^{X_ t}$ gives

\[ \renewcommand{\mathspread }{\, \, \, }\Pr [X_ t \ge c] {\, \, \, {=}\, \, \, }\Pr [3^{X_ t} \ge 3^ c] {\, \, \, {\le }\, \, \, }\frac{\textrm{E}[3^{X_ t}]}{3^{c}} {\, \, \, {=}\, \, \, }\frac{2^ t}{3^{c}} \approx 2^{t - 1.58\, c}. \]

For example, taking $t=100$ and $c=70$ , we can conclude that the probability of getting $70$ or more heads in $100$ flips is at most about 5/10000. Choosing a base other than 3 may give a better bound.

The proof of the Chernoff bound generalizes this idea, and optimizes the base of the exponent. Typically, in order to bound the probability of deviating by a $1+\varepsilon$ factor from the mean, the best base for the exponent is something like $1+\varepsilon \approx e^\varepsilon$ . To prove bounds on deviations below (as opposed to above) the mean, one uses a base less than 1. For most settings, these bounds are tight up to small constant factors in the exponent.

Full statement and proof of a typical Chernoff bound
Using randomized rounding to derive Lagrangian-relaxation algorithms
Chernoff bound (wikipedia)
Azuma’s inequality (wikipedia)
Binomial distribution (wikipedia)
Normal distribution (wikipedia)

Bibliography

[1]	H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Math. Stat., 23:493–509, 1952.
[2]	W. Hoeffding. Probability inequalities for sums of bounded random variables. American Statistical Journal, pages 13–30, Mar. 1963.

Notes on algorithms

Lecture notes on algorithms

Chernoff bound

The distribution of a sum of “small” random variables

Example applications

Expectations of exponentials

Related

Bibliography