# How I Remember the Chernoff Bound

The Chernoff bound is of critical importance in computer science and probability. When I need it, though, I almost always have to look it up to make sure I get all the parameters correct. Not any more, though. I’ll explain how the Chernoff bound is essentially the PDF of the “Gaussian you would expect” from the Central Limit Theorem.

The most common form of the Chernoff bound is the following: suppose you have $n$ independent and identically distributed coin flips $X_1, X_2, \dots, X_n$, say the result of repeatedly flipping a coin that comes up Heads with probability $p$. The number of Heads is by definition a Binomial distribution; let $S$ denote this random variable. Then with all but exponentially small probability, $S$ is within $O(\sqrt{n})$ of its mean:

$\text{Pr}[\left|S - np\right| > \delta \sqrt{n}] \leq 2\exp(-\delta^2 / 2p(1-p))$

The Central Limit Theorem (CLT) tells us that as $n \to \infty$, the distribution $(S - \mathbb{E}(S)) / \sqrt{\text{Var}(S)}$ approaches a standard normal distribution. Philosophically, $S$ approaches a normal distribution with the parameters you would expect: mean $\mathbb{E}(S)$ and variance $\text{Var}(S)$ (though the “scaled up” distributions may not necessarily converge from the CLT alone).

In our case, $\mathbb{E}(S) = np$ and $\text{Var}(S) = np(1-p)$. The PDF of the “Gaussian you would expect” is

$p(S = x) \approx C \cdot \exp(-(x - np)^2 / 2np(1-p))$

for an appropriate normalizing constant $C = 1/\sqrt{2\pi n p(1-p)}$. Let’s rewrite that with $x = np + \delta\sqrt{n}$:

$p(S = np + \delta\sqrt{n}) \approx C \cdot \exp(-(\delta \sqrt{n})^2 / 2np(1-p)) =C \cdot \exp(-\delta^2/2p(1-p))$

This is (up to the factor of $C$) the expression appearing in the Chernoff bound! Thus we can think of the Chernoff bound as expressing an “even when $n$ is small” version of the CLT, with a little bit of loss from $1/\sqrt{2\pi}$ to $2$ in the multiplicative factor.

### Using the CDF instead of the PDF

We lost some credibility at one point in the technique above: we actually should have been looking at the cumulative probability

$\text{Pr}[\left|S - np \right| \geq \delta \sqrt{n}]$

Instead we noticed that the Chernoff bound can be remembered by looking at the PDF (and ignoring a nonconstant factor)

$p(S = np+\delta\sqrt{n})$

But using the CDF instead of the PDF actually gives the same expression, and with a truly constant factor. Let’s compute:

$\text{Pr}[\left|S - np \right| \geq \delta \sqrt{n}] = 2\int_{np+\delta\sqrt{n}}^\infty p(S = x) d x$

$= 2\int_{\delta\sqrt{n}}^\infty 1/ \sqrt{2\pi np(1-p)} \cdot \exp(-x^2 / 2np(1-p)) d x$

If we break up the integral into little $\sqrt{n}$ size pieces from $\delta \sqrt{n}$ to $\infty$, the integral on a piece $[k\sqrt{n}, (k+1)\sqrt{n}$ looks like

$\int_{k\sqrt{n}}^{(k+1)\sqrt{n}} 1/ \sqrt{2\pi np(1-p)} \cdot \exp(-x^2 / 2np(1-p)) d x$

$\leq1/ \sqrt{2\pi np(1-p)} \int_{k\sqrt{n}}^{(k+1)\sqrt{n}} \exp(-k^2 / 2p(1-p)) d x$

$= 1/ \sqrt{2\pi p(1-p)} \exp(-k^2 / 2p(1-p))$

The exponent increases faster than linearly,  and it is the only thing that changes in $k$, so this contribution to the integral goes to 0 faster than geometrically.  Therefore the integral is bounded by an infinite geometric series with initial term proportional to $\exp( -\delta^2/2p(1-p))$. Therefore the cumulative probability is (up to a constant) the expression in the Chernoff bound

$\exp(-\delta^2/2p(1-p))$

and this time the constant is actually constant!