Expected deviation of a sum

Bounds on the expected deviation of a sum from a threshold.

Here are bounds on the expected deviation of a sum of 0/1-random variables above or below some threshold (typically near its mean).

For example, suppose Alice flips a fair coin n times. She pays Bob $1 for each head after the first (1+\varepsilon )n/2 heads (if any). What is her expected payment to Bob? The bounds here say: at most \varepsilon ^{-1} \exp (-\varepsilon ^2 n/6). For example, if \varepsilon =\Omega (1/\sqrt n), the expected payment is O(\sqrt n). If \varepsilon =\sqrt {6c\ln (n)/n}, the expected payment is O(1\, /\, n^{c-1}\sqrt {\ln n}). In general the expectation is about \varepsilon ^{-1} times the probability (according to Chernoff) that the sum exceeds its mean by a factor of 1+\varepsilon .



Sums with a fixed number of terms

First, bounds that are directly analogous to standard Chernoff bounds:

Lemma 1 (expected deviation from threshold).

Let Y=\sum _{t=1}^ T y_ t be a sum of independent random variables, where each y_ t is in [0,1] and T\gt 0 is fixed. Let \varepsilon , \mu \gt 0 with \varepsilon \le 1.

(a) If \textrm{E}[y_{t}] \le \mu for all t\le T then \textrm{E}[\max \{ 0, Y – (1+\varepsilon )\mu T\} ] \, \lt \, \varepsilon ^{-1}\, \exp ({-\varepsilon ^2}\mu T/3).

(b) If \textrm{E}[y_{t}] \ge \mu for all t\le T then \textrm{E}[\max \{ 0, (1-\varepsilon )\mu T – Y\} ] \, \lt \, \varepsilon ^{-1}\, \exp ({-\varepsilon ^2}\mu T/2).

Proof of Lemma 1

Proof.

(a) Using1 \max \{ 0,z\} \le \varepsilon ^{-1} (1+\varepsilon )^ z, then following the proof of the Chernoff bound, the expectation in question is at most

\begin{equation} \label{eqn} \textrm{E}\left[\frac{\varepsilon ^{-1}\, (1+\varepsilon )^ Y}{(1+\varepsilon )^{(1+\varepsilon )\mu T}}\right] ~ \le ~ \varepsilon ^{-1} \Big( \frac{e^{\varepsilon }}{(1+\varepsilon )^{1+\varepsilon }} \Big)^{\mu T} ~ \lt ~ \varepsilon ^{-1} \exp (-\varepsilon ^2 \mu T/3). \end{equation}

(b) Using2 \max \{ 0,z\} \le \varepsilon ^{-1}(1-\varepsilon )^{-z}, then following the proof of the Chernoff bound, the expectation in question is at most

\begin{equation} \label{eqn2} \textrm{E}\left[\frac{\varepsilon ^{-1}\, (1-\varepsilon )^ Y}{(1-\varepsilon )^{(1-\varepsilon )\mu T}}\right] ~ \le ~ \varepsilon ^{-1} \Big( \frac{e^{-\varepsilon }}{(1-\varepsilon )^{1-\varepsilon }} \Big)^{\mu T} ~ \lt ~ \varepsilon ^{-1} \exp (-\varepsilon ^2 \mu T/2). \end{equation}
 
 

Sums with randomly many terms

Assume a system goes through a random sequence of states S_1,S_2,\ldots ,S_ T, where each state S_ t determines two values: x_ t and y_ t, each in [0,1], where the expectation of each x_ t is at most that of y_ t:

\[ \textrm{E}[x_ t \, |\, S_{t-1}]~ \le ~ \textrm{E}[y_ t \, |\, S_{t-1}]. \]

Let X_ t=\sum _{i=1}^ t x_ i and Y_ t=\sum _{i=1}^ t y_ i.

Lemma 2 (expected deviation from threshold).

For any stopping time T with finite expectation, and any \varepsilon ,\mu \gt 0 with \varepsilon \le 1, if X_ t – Y_ t is uniformly bounded from above (for all t\lt T, by some finite bound independent of the outcome of the experiment), then the expectation of \textstyle \max \big \{ \, 0, ~ X_ T/(1+\varepsilon ) ~ -~ Y_ T/(1-\varepsilon ) ~ -~ \varepsilon \mu \, \big \} is at most \varepsilon ^{-1}\exp (-\varepsilon ^2\mu ).

Proof of Lemma 2

Proof.

The proof is similar to that of Lemma 1, but following the proof of stopping-time Chernoff, instead of the proof of Chernoff.

Using3 \max \{ 0,a-b-c\} \le \varepsilon ^{-1}(1+\varepsilon )^{(1+\varepsilon )a}(1-\varepsilon )^{(1-\varepsilon )b}e^{-\varepsilon c} for a,b,c\ge 0, the expectation in question is at most \textrm{E}[\phi _ T] where

\begin{equation} \label{eqn3} \phi _ t = \varepsilon ^{-1}\, (1+\varepsilon )^{X_ t}(1-\varepsilon )^{Y_ t} e^{-\varepsilon ^2\mu }. \end{equation}

To finish, note that \phi is a super-martingale, and \phi _ t is uniformly bounded from below (by 0) and from above (as (1+\varepsilon )^{X_ t}(1-\varepsilon )^{Y_ t} \le \exp (\varepsilon (X_ t-Y_ t)) and X_ t-Y_ t is uniformly bounded from above). Hence, by Walds, the expectation of \phi _ T is at most \phi _0 = \varepsilon ^{-1}\exp (-\varepsilon ^2\mu ).

 
 

Footnotes

  1. ~ z\le e^{z-1} \Rightarrow ~ \alpha z \le e^{\alpha z}/e \Rightarrow ~ z \le e^{\alpha z} / e\alpha \Rightarrow ~ z \le (1+\varepsilon )^ z / e\ln (1+\varepsilon ) \Rightarrow ~ z \le (1+\varepsilon )^ z / \varepsilon .
  2. WLOG z\ge 0. That and 1+\varepsilon \lt 1/(1-\varepsilon ) and z\le (1+\varepsilon )^ z/\varepsilon imply z \le (1+\varepsilon )^ z / \varepsilon \lt (1-\varepsilon )^{-z}/\varepsilon .
  3. Using (1+\varepsilon )^{1+\varepsilon }(1-\varepsilon )^{1-\varepsilon } \ge 1 and (1+\varepsilon )^{1+\varepsilon }e^{-\varepsilon } \ge 1, reduce to case a=0 (trivial) or b=c=0 (follows from previous).