Extending the Chernoff bound to handle sums with randomly many terms.
Alice goes to the casino and plays bets on a sequence of fair coin flips. On the tth bet, Alice chooses an amount a_ t\in [0,1] to bet: she wins a_ t if this flip is heads, otherwise she loses a_ t. Since it’s Vegas, she never stops playing. Fix any \varepsilon \gt 0. Let W_ t be the sum of bets won after the tth bet. Let L_ t be the sum of bets lost after the tth bet. Will Alice ever reach a time t such that W_ t/(1+\varepsilon ) – L_ t/(1\varepsilon ) \ge \varepsilon \mu ? The bound below says that the probability that she does is less than \exp (\varepsilon ^2\mu ).
“Stoppingtime” Chernoff bound
The bound applies to a system that goes through a random sequence of states S_1,S_2,S_3,\ldots , where each state S_ t determines two values: x_ t and y_ t, each in [0,1], where the expectation of x_ t is at most that of y_ t:
The bound shows that \sum _{t=1}^ T x_ t is unlikely to ever significantly exceed \sum _{t=1}^ T y_ t (for any T).
Given the conditions above, let \varepsilon ,\mu \gt 0 with \varepsilon \le 1. The probability that
is less than \exp (\varepsilon ^2\mu ).
(Various bounds of this kind are possible. The constants in this bound are not optimized.)
Proof
The proof adapts the standard Chernoff proof.
Pessimistic estimator
Note that \phi _ t in the proof above also serves as a pessimistic estimator on the probability of event \eqref{event2}: that is, (a) its initial value, \phi _0, is \exp (\varepsilon ^2\mu ), and (b) for each t, given the current state S_ t, the conditional probability of \eqref{event2} is less than \phi _ t (so the event won’t happen as long as the algorithm keeps \phi _ t \le 1), and (c) \phi is a supermartingale.
Related

Azuma’s inequality (wikipedia)
Footnotes
 Let X=\sum _{s=1}^ t x_ s and Y=\sum _{s=1}^ t y_ s. \eqref{event2} implies \exp (X \varepsilon / (1+\varepsilon ) – Y \varepsilon /(1\varepsilon )) ~ \ge ~ \exp (\varepsilon ^2 \mu ) . Using e^{z/(1+z)} \lt 1+z for z\in \{ \varepsilon ,\varepsilon \} gives (1+\varepsilon )^ X(1\varepsilon )^ Y ~ \gt ~ \exp (\mu \varepsilon ^2).