Expected maximum (or minimum) of many sums

Bounds on the expected maximum (or minimum) among a collection of sums.

It can be technically convenient to work with expectations directly, instead of working with probabilities. Here, given a collection of sums of 0/1 random variables, we bound the expected maximum (or minimum) sum in the collection.

For example, suppose Alice throws balls randomly into n bins just until the first bin has n balls. The bound says that the expected maximum number of balls in any bin will be at most n+2\sqrt {n\ln n}+\ln n. Similarly, the expected minimum number of balls in any bin will be at least n-2\sqrt {n\ln n}.

The bound differs from Chernoff in a few ways:

  • it bounds the expected maximum or minimum (as opposed to the probability of a large deviation),

  • the sums can have randomly many terms, so they don’t have to be concentrated around their means.



Warm-up: sums with a fixed number of terms

For a warm up, here are bounds that are directly analogous to standard Chernoff bounds, in that they concern sums that have a fixed number of terms.

Lemma 1 (expected maximum or minimum of many sums).

For each j\in [m], let X^ j=\sum _{t=1}^ T x^ j_ t be a sum of independent random variables, where each x^ j_ t is in [0,1] and T\gt 0 is fixed. Let \varepsilon , \mu \gt 0.

(a) If \textrm{E}[x^ j_ t] \le \mu for all t\le T and all j\in [m], then

\[ \textstyle \textrm{E}\left[\max _{j\in [m]} X^ j\right] ~ \lt ~ (1+\varepsilon )(\mu T + \varepsilon ^{-1}\ln m). \]

Taking \varepsilon = \sqrt {\ln (m)/\mu T}, the expected maximum is less than \mu T + 2\sqrt {\mu T \ln m} + \ln m.

(b) If \textrm{E}[x^ j_ t] \ge \mu for all t\le T and all j\in [m], then

\[ \textstyle \textrm{E}\left[\min _{j\in [m]} X^ j\right] ~ \gt ~ (1-\varepsilon )\mu T – \varepsilon ^{-1}\ln m. \]

Taking \varepsilon = \sqrt {\ln (m)/\mu T}, the expected minimum is more than \mu T – 2\sqrt {\mu T\ln m}.

Proof of Lemma 1

Sums with a random number of terms

The next bound applies to a system that goes through a random sequence of states S_1,S_2,S_3,\ldots ,S_ T, where T is a stopping time with finite expectation. At each time t, the current state S_ t determines m+1 values: x^ j_ t for j\in [m] and y_ t, where each x^ j_ t is in [0,1] and \textrm{E}[x^ j_ t] \le \textrm{E}[y_ t] for j\in [m] (or \textrm{E}[x^ j_ t] \ge \textrm{E}[y_ t] for j\in [m]). The bound shows that the expectation of \max _ j \sum _{t=1}^ T x^ j_ t (or \min _ j \sum _{t=1}^ T x^ j_ t) is close to the expectation of \sum _{t=1}^ T y_ t.

Lemma 2 (expected maximum or minimum of many sums).

Given the conditions above, let \varepsilon \in [0,1].

(c) If \textrm{E}[x^ j_ t – y_ t\, |\, S_{t-1}]\le 0 for all j\in [m] and all t\le T, then

\[ \textstyle \textrm{E}\Big[\, {\max _{j\in [m]}}\, \sum _{t=1}^ T x^ j_ t\, \Big] ~ \le ~ (1+\varepsilon ) \big (\textrm{E}\big [\sum _{t=1}^ T y_ t\big ] \, +\, \varepsilon ^{-1}\ln m\big ). \]

(d) If \textrm{E}[x^ j_ t – y_ t\, |\, S_{t-1}]\ge 0 for all j\in [m] and all t\le T, then

\[ \textstyle \textrm{E}\Big[\, {\min _{j\in [m]}}\, \sum _{t=1}^ T x^ j_ t\, \Big] ~ \ge ~ (1-\varepsilon )\textrm{E}\big [\sum _{t=1}^ T y_ t\big ] \, -\, \varepsilon ^{-1}\ln m. ~ ~ \]

The constants in the bounds are not optimized.

Proof of Lemma 2

Some pessimistic estimators for the bound (c)

(Right now we’ve just recorded pessimistic estimators for (c). We’ll add the rest here as needed.)

The bound (c) relates \max _{j\in [m]} \sum _{t=1}^ T x_ j^ t and \sum _{t=1}^ T y_ t. In a typical use of bound (c), we know a bound on one of the two quantities, and we use (c) to bound the expectation of the other. We start with pessimistic estimators for these two cases. Let X^ j_ t = \sum _{i=1}^ t x^ j_ i. Let Y_ t = \sum _{i=1}^ t y_ i.

  1. Consider first the case that we start with an upper bound Y_ T \le U, and we use (c) to upper bound \textrm{E}[\max _{j\in [m]} X^ j_ T] by (1+\varepsilon )(U + \varepsilon ^{-1}\ln m).

    Given the outcomes of the first t iterations (in particular, values of Y_ t and X^ j_ t for all j) the following pessimistic estimator for the conditional expectation of \textrm{E}[\max _{j\in [m]} X^ j_ T] is implicit in the proof of (c):

    \begin{equation} \label{pe:c:U}\textstyle \psi _ t ~ =~ \phi _ t + (1+\varepsilon )(U – Y_ t) ~ =~ \log _{1+\varepsilon }\sum _{j=1}^ m (1+\varepsilon )^{X_ t} ~ ~ +~ (1+\varepsilon )(U – Y_ t). \end{equation}

    The initial value \psi _0 is less than (1+\varepsilon )(U + \varepsilon ^{-1}\ln m), as desired. The final value \psi _ T is an upper bound on the maximum in question (given Y_ T\le U). In a given iteration, letting \omega _ j = (1+\varepsilon )^{X^ j_ t}, the increase \psi _ t-\psi _{t-1} is at most

    \begin{equation} \label{pe:c:U:delta}\textstyle (1+\varepsilon ) \sum _ j x^ j_ t \omega _ j/\sum _ j \omega _ j ~ -~ (1+\varepsilon )y_ t, \end{equation}

    which, in expectation is non-positive.

  2. Consider next the case that we start with a lower bound L \le \max _{j\in [m]} X^ j_ T on the maximum, and we use (c) to lower bound \textrm{E}[Y_ T] by (1+\varepsilon )^{-1} L – \varepsilon ^{-1}\ln m.

    Given the outcomes of the first t iterations (in particular, values of Y_ t and X^ j_ t for all j), the following pessimistic estimator for the conditional expectation of \textrm{E}[Y_ T] is implicit in the proof of (c):

    \begin{equation} \label{pe:c:L}\textstyle \psi _ t ~ =~ Y_ t + (1+\varepsilon )^{-1}(L – \log _{1+\varepsilon }\sum _{j=1}^ m (1+\varepsilon )^{X_ t}). \end{equation}

    The initial value \psi _0 is more than (1+\varepsilon )^{-1}L – \varepsilon ^{-1}\ln m, as desired. The final value \psi _ T is a lower bound on Y_ T (given that the maximum is at least L). In a given iteration, letting \omega _ j = (1+\varepsilon )^{X^ j_ t}, the increase \psi _ t-\psi _{t-1} is at least

    \begin{equation} \label{pe:c:L:delta}\textstyle y_ t – \sum _ j x^ j_ t \omega _ j/\sum _ j \omega _ j, \end{equation}

    which, in expectation is non-negative.

  3. Or, it may be the case that we have no a-priori upper bound U or lower bound L. Rather, we simply want an outcome where the inequality in (c) holds: \max _{j\in [m]} X^ j_ t \le (1+\varepsilon ) (Y_ T + \varepsilon ^{-1} \ln m). In that case, we can use the pessimistic estimator

    \begin{equation} \label{pe:c}\textstyle \psi _ t ~ =~ \log _{1+\varepsilon } \sum _ j (1+\varepsilon )^{X^ j_ t} ~ ~ -~ (1+\varepsilon ) (Y_ t + \varepsilon ^{-1}\ln m). \end{equation}

    It is initially negative. If its final value is at most 0, then the desired inequality holds. In a given iteration, the increase \psi _{t} – \psi _{t-1} is at most Expression \eqref{pe:c:U:delta}, which is non-positive in expectation.