Wald’s equation

Wald’s equation, a form of linearity of expectation for sums with randomly many terms.

Consider a sum \sum _{t=1}^ T x_ t of random variables, where the number of terms T is itself a random variable. If each term x_ t has expectation at most (or at least) \mu , then the expectation of the sum is at most (or at least) \mu \, \textrm{E}[T] (the bound on the expectation of each term, times the expected number of terms). This holds provided the random variables in the sum are bounded above or below and T is a stopping time with finite expectation.

Formal requirements

For Wald’s equation to hold certain technical requirements must be met. Being aware of them can help avoid confusion in some corner cases. Here are the requirements:

  1. T must be a stopping time. That is, at each time t, the state S_ t that determines x_ t is also enough to determine whether T=t. (For example, if T is the number of rolls until an odd roll, then T is a stopping time. However, if T is the number of rolls until the roll before the first odd roll, that is not a stopping time — knowing whether T\ge t requires knowing whether the next roll will be odd.)

  2. T must have finite expectation.

  3. All T terms must be (uniformly) bounded on one side — that is, there must exist a constant c, independent of the outcome of the random experiment, such that either, in all outcomes, every x_ t is at most c, or, in all outcomes, each x_ t is at least c.

Wald’s equation


Let T be a stopping time with finite expectation. Assume that the values x_ t (for t\le T) are bounded from above, or are bounded from below.

If the expectation of each x_ t is at most \mu , then the sum’s expectation is at most \mu \, \textrm{E}[T] — that is, if \textrm{E}[x_ t\, |\, T\ge t] \le \mu for all t, then \textrm{E}\big [\sum _{t=1}^ T x_ t\big ] ~ \le ~ \mu \, \textrm{E}[T].

If the expectation of each x_ t is at least \mu , then the sum’s expectation is at least \mu \, \textrm{E}[T].


Exercise 1.

Alice rolls a fair six-sided die (that is, she randomly sample numbers i.i.d. uniformly from \{ 1,2,3,4,5,6\} ), until she rolls an odd number. What is the expected number of 5’s that she rolls?

Exercise 2.

Roll a fair six-sided die, stopping with the first 1. (i) Show that the expected sum of the rolls is 21. (ii) Show that if you stop with the first 6 (instead of the first 1) the expected sum is also 21.

Exercise 3.

Show that each of the three technical requirements above is necessary. For each requirement, describe a random sum meeting the other two requirements, but with \textrm{E}[x_ t] = 0 and \textrm{E}[\sum _{t=1}^ T x_ t] = 1.

Proof of Wald’s equation

This “one-sided” variant of Wald’s equation is a little non-standard. Here’s a proof.

A false proof

Here’s what seems to be a proof that Wald’s equation holds even without the last two technical requirements. Can you find the error?

MathJax is loading, equations are not yet displayed...


    1. Roll a six-sided die. Take the number on the die (call it $N$) and flip a fair coin that many times. Count the total number of heads, and call that $H$. By Wald’s equation, the expected value of $H$ (on average over many trials) is the expectation of $N$ times the probability of getting a head on each particular flip ($1/2$):
      \[\operatorname{E}[H] = \operatorname{E}[N] \frac{1}2 = \frac{1+2+3+4+5+6}6\cdot\frac{1}2 = \frac{3.5}{2} = 1.75.\]
      Does that help? (Or what is it that you would like simplified?) See also this paper by J. Hein at Dartmouth, which discusses the case when the $X_t$’s are independently and identically distributed.

      - Neal Post author