Approximating Binomial Coefficients with Entropy

This post recounts the story of one of my favorite “probabilistic method” proofs,

$\log \binom{n}{k} \leq n H(k/n)$

where $H(p)$ is the binary entropy function i.e. the entropy of a coin flipped with bias $p$. Maybe more properly the proof is just counting in two ways?

By a tiny tweak to our entropy argument as done here, one can get a better bound

$\log\displaystyle\sum_{i = 0}^{k} \binom{n}{k} \leq nH(k / n)$

This is what’s typically used in practice to bound e.g. the sum of the first $1/3$ of the binomial coefficients by $2^{n H(1/3)}$. The proof here gives all the essential ideas for that proof. With Stirling’s approximation used to give error bounds, both approximations are tight when $k = \Theta(n)$,

$\binom{n}{k} = (1+o(1))2^{n H(k / n)}$

The experiment: Let the random variable $X$ denote the result of flipping $n$ fair coins i.e. $X$ is binomially distributed. Now suppose we know $X$ has exactly $k$ Heads.

The question: How much entropy is in the conditional distribution of $X$ given $k$ Heads? I.e. what is $H(X \mid k\text{ Heads})$?

The left side: The conditional distribution of $X$ is uniform over $\binom{n}{k}$ possible results. The entropy of the uniform distribution on $N$ objects is $\log N$, therefore

$H(X \mid k\text{ Heads}) = \log \binom{n}{k}$.

The right side: On the other hand, $X$ is specified by its bits $X_i$, so

$H(X \mid k \text{ Heads}) = H(X_1, X_2, \dots, X_n \mid k \text{ Heads})$

$H(X_1, X_2, \dots, X_n \mid k \text{ Heads}) \leq H(X_1 \mid k \text{ Heads}) + H(X_2 \mid k \text{ Heads}) + \cdots + H(X_n \mid k \text{ Heads})$

This expression is easy to compute: when flipping $k$ Heads of $n$ flips, each $X_i$ looks like a coin with bias $k /n$, therefore its entropy is $H(k/n)$. So

$H(X_1 \mid k \text{ Heads}) + H(X_2 \mid k \text{ Heads}) + \cdots + H(X_n \mid k \text{ Heads})= n H(k/n)$

giving the right-hand side.

Further Generalization: Multinomial Coefficients

We can naturally generalize the above argument to provide an approximation for multinomial coefficients. Fix natural numbers $k_1, k_2, \dots, k_m$ with $\sum_i k_i = n$. Let $K$ be a distribution over $[m]$ taking $i$ with probability $k_i / n$. Then

$\log \binom{n}{k_1, k_2, \dots, k_m} \leq nH(K)$

The proof generalizes perfectly: the underlying set is $[m]^{n}$ (place a letter from $\{1, 2, \dots, m\}$ at each of $n$ indices), and we consider the entropy of a uniform choice of a string constrained to have the fixed letter histogram $k_1, k_2, \dots, k_m$. This is exactly the left-hand side, and subadditivity yields the right-hand side.

Moreover, by Stirling again the approximation is tight when $k_i = \Theta(n)$.