The statement and proof of a typical Chernoff bound.

# A Chernoff bound, and proof

**Lemma**(Chernoff bound [1, 2]).

Let Y=\sum _{t=1}^ T y_ t be a sum of independent random variables, where each y_ t is in [0,1] and T\gt 0 is fixed. Let \varepsilon , \mu \gt 0 with \varepsilon \le 1.

(a) If \textrm{E}[y_{t}] \le \mu for all t\le T then \Pr [Y \ge (1+\varepsilon )\mu T] \, \lt \, \exp ({-\varepsilon ^2}\mu T/3).

(b) If \textrm{E}[y_{t}] \ge \mu for all t\le T then \Pr [Y \le (1-\varepsilon )\mu T] \, \lt \, \exp ({-\varepsilon ^2}\mu T/2).

**Proof.**

Here is a proof for part (a).

(The inequality above follows from the Markov bound.)

Next we bound the above expectation. The independence of the y_ t’s implies

(The second step uses the convexity of (1+\varepsilon )^ z as a function of z, specifically, (1+\varepsilon )^ z \le 1+\varepsilon z for z\in [0,1]. The third step uses \textrm{E}[y_ t] \le \mu . The fourth step uses 1+z\lt e^ z for z\ne 0.)

The rest is algebra. Combining the above bound with (\ref{eqn}), the probability in question is less than

(The last inequality follows from e^{\varepsilon }/(1+\varepsilon )^{(1+\varepsilon )} \lt \exp ({-\varepsilon ^2}/3) for \varepsilon \in (0,1].)

The proof for part (b) is similar — change the sign of \varepsilon throughout, and use e^{\varepsilon }/{(1-\varepsilon )}^{(1-\varepsilon )} \lt \exp ({-\varepsilon ^2}/2).

**Exercise.**

Prove part (b) of the Chernoff bound.

**Remark: the proof needs only first-order (linear) approximations.**

The particular inequalities used in the proof above are elegant and convenient, but other inequalities could be used just as well. For example, we could change the base of the exponent in the proof from 1+\varepsilon to \exp (\varepsilon ) and then push the proof through using inequalities such as \exp (\varepsilon ) \le 1+\varepsilon + \varepsilon ^2. The proof will work with any inequalities in which the first-order terms (for small \varepsilon ) are tight, although the resulting bound may have a smaller constant in the exponent.

**Remark: upper bounds when \varepsilon is large.**

Sometimes one wants an upper bound on the probability of a large upper deviation: \Pr [Y \ge \lambda \mu T] where \lambda is larger than 2. The proof above for part (a) applies up to the last step, showing an upper bound of (e/\lambda )^\lambda /e = \exp (-\lambda \ln \lambda + \lambda – 1). For example, if you throw n balls in n bins, taking \lambda = c\ln (n)/\ln \ln n for some c, the event that a given bin has more than c\ln (n)/\ln \ln n balls is less than n^{-\Theta (c)}.

**Pessimistic estimators.**

When the time comes to apply the method of conditional probabilities to an existence proof that uses the Chernoff bound, you’ll need a pessimistic estimator for the bound.