A brief introduction to Chernoff bounds.
If you’re already familiar with Chernoff bounds, you may prefer to skip directly to the statement and proof of a typical Chernoff bound.
Chernoff bounds (a.k.a. tail bounds, Hoeffding/Azuma/Talagrand inequalities, the method of bounded differences, etc. [1, 2]) are used to bound the probability that some function (typically a sum) of many “small” random variables falls in the tail of its distribution (far from its expectation).
The distribution of a sum of “small” random variables
Let Y be a sum of many small random variables. For example, imagine that Y is the number of heads in n coin flips. The figure shows the typical distribution of such a sum around its expectation \mu . For \varepsilon \in (0,1), the sum has probability about e^{-\varepsilon ^2\mu } of deviating from \mu by more than a factor of 1+\varepsilon (that is, an additive deviation of \varepsilon \mu ).
For example (taking any \varepsilon =\Theta (1/\sqrt \mu )) the sum has constant probability of deviating from \mu by an additive \Theta (\sqrt \mu ).
Or (taking \varepsilon =\Theta (\sqrt {\log (n)/\mu })), the sum has probability n^{-\Theta (1)} of deviating from \mu by more than \Theta (\sqrt { \mu \log n}).
A typical Chernoff bound quantifies this behavior. It says that the event that Y exceeds 1+\epsilon times its expectation \mu is rare: it happens with probability at most e^{-\mu \epsilon ^2/2}. (A similar bound holds for \Pr [ Y \le (1-\epsilon )\mu ].)
Example applications
Here are a couple of examples using a Chernoff bound.
Throw 6 n\ln n balls randomly into n bins. Let Y be the number of balls thrown into a given bin. Then Y is a sum of 6 n\ln n independent random \{ 0,1\} -variables:
The expectation of Y is 6\ln n. By the Chernoff bound, for any \varepsilon \in (0,1),
Taking \varepsilon = 1 gives \Pr [Y \ge 12\ln n] {\, \, \, {<}\, \, \, }1/n^2 — the probability that a given bin receives at least 12\ln n balls is less than 1/n^2.
The naive union bound implies that the probability that any of the n bins has more than 12\ln n balls is at most 1/n.
Show that if n^2 balls are thrown, then the probability that any bin has more than n+O(\sqrt {n\ln n}) balls is at most 1/n.
Expectations of exponentials
The proof of the Chernoff bound for a sum Y works by reasoning about the expectation of (1+\varepsilon )^{Y}. Here is some intuition for this approach. (For the full proof, go here.)
Recall the humble Markov bound: for any non-negative random variable X, the probability that X exceeds c is at most E[X]/c. This is useful but somewhat weak. Can we strengthen it by transforming X?
To take a concrete example, let X_ t be the number of heads in t random coin flips. Consider \textrm{E}[3^{X_ t}]. By direct calculation,
By induction, \textrm{E}[3^{X_ t}] = 2^ t.
Does this give anything useful? Applying the Markov bound to 3^{X_ t} gives
For example, taking t=100 and c=70, we can conclude that the probability of getting 70 or more heads in 100 flips is at most about 5/10000. Choosing a base other than 3 may give a better bound.
The proof of the Chernoff bound generalizes this idea, and optimizes the base of the exponent. Typically, in order to bound the probability of deviating by a 1+\varepsilon factor from the mean, the best base for the exponent is something like 1+\varepsilon \approx e^\varepsilon . To prove bounds on deviations below (as opposed to above) the mean, one uses a base less than 1. For most settings, these bounds are tight up to small constant factors in the exponent.
Related
-
Using randomized rounding to derive Lagrangian-relaxation algorithms
-
Chernoff bound (wikipedia)
-
Azuma’s inequality (wikipedia)
-
Binomial distribution (wikipedia)
-
Normal distribution (wikipedia)