A brief introduction to Chernoff bounds.
If you’re already familiar with Chernoff bounds, you may prefer to skip directly to the statement and proof of a typical Chernoff bound.
Chernoff bounds (a.k.a. tail bounds, Hoeffding/Azuma/Talagrand inequalities, the method of bounded differences, etc. [1, 2]) are used to bound the probability that some function (typically a sum) of many “small” random variables falls in the tail of its distribution (far from its expectation).
The distribution of a sum of “small” random variables
Let Y be a sum of many small random variables. For example, imagine that Y is the number of heads in n coin flips. The figure shows the typical distribution of such a sum around its expectation \mu . For \varepsilon \in (0,1), the sum has probability about e^{\varepsilon ^2\mu } of deviating from \mu by more than a factor of 1+\varepsilon (that is, an additive deviation of \varepsilon \mu ).
For example (taking any \varepsilon =\Theta (1/\sqrt \mu )) the sum has constant probability of deviating from \mu by an additive \Theta (\sqrt \mu ).
Or (taking \varepsilon =\Theta (\sqrt {\log (n)/\mu })), the sum has probability n^{\Theta (1)} of deviating from \mu by more than \Theta (\sqrt { \mu \log n}).
A typical Chernoff bound quantifies this behavior. It says that the event that Y exceeds 1+\epsilon times its expectation \mu is rare: it happens with probability at most e^{\mu \epsilon ^2/2}. (A similar bound holds for \Pr [ Y \le (1\epsilon )\mu ].)
Example applications
Here are a couple of examples using a Chernoff bound.
Throw 6 n\ln n balls randomly into n bins. Let Y be the number of balls thrown into a given bin. Then Y is a sum of 6 n\ln n independent random \{ 0,1\} variables:
The expectation of Y is 6\ln n. By the Chernoff bound, for any \varepsilon \in (0,1),
Taking \varepsilon = 1 gives \Pr [Y \ge 12\ln n] {\, \, \, {<}\, \, \, }1/n^2 — the probability that a given bin receives at least 12\ln n balls is less than 1/n^2.
The naive union bound implies that the probability that any of the n bins has more than 12\ln n balls is at most 1/n.
Show that if n^2 balls are thrown, then the probability that any bin has more than n+O(\sqrt {n\ln n}) balls is at most 1/n.
Expectations of exponentials
The proof of the Chernoff bound for a sum Y works by reasoning about the expectation of (1+\varepsilon )^{Y}. Here is some intuition for this approach. (For the full proof, go here.)
Recall the humble Markov bound: for any nonnegative random variable X, the probability that X exceeds c is at most E[X]/c. This is useful but somewhat weak. Can we strengthen it by transforming X?
To take a concrete example, let X_ t be the number of heads in t random coin flips. Consider \textrm{E}[3^{X_ t}]. By direct calculation,
By induction, \textrm{E}[3^{X_ t}] = 2^ t.
Does this give anything useful? Applying the Markov bound to 3^{X_ t} gives
For example, taking t=100 and c=70, we can conclude that the probability of getting 70 or more heads in 100 flips is at most about 5/10000. Choosing a base other than 3 may give a better bound.
The proof of the Chernoff bound generalizes this idea, and optimizes the base of the exponent. Typically, in order to bound the probability of deviating by a 1+\varepsilon factor from the mean, the best base for the exponent is something like 1+\varepsilon \approx e^\varepsilon . To prove bounds on deviations below (as opposed to above) the mean, one uses a base less than 1. For most settings, these bounds are tight up to small constant factors in the exponent.
Related

Using randomized rounding to derive Lagrangianrelaxation algorithms

Chernoff bound (wikipedia)

Azuma’s inequality (wikipedia)

Binomial distribution (wikipedia)

Normal distribution (wikipedia)