# Approximating Binomial Coefficients with Entropy

This post recounts the story of one of my favorite “probabilistic method” proofs,

$\log \binom{n}{k} \leq n H(k/n)$

where $H(p)$ is the binary entropy function i.e. the entropy of a coin flipped with bias $p$. Maybe more properly the proof is just counting in two ways?

By a tiny tweak to our entropy argument as done here, one can get a better bound

$\log\displaystyle\sum_{i = 0}^{k} \binom{n}{k} \leq nH(k / n)$

This is what’s typically used in practice to bound e.g. the sum of the first $1/3$ of the binomial coefficients by $2^{n H(1/3)}$. The proof here gives all the essential ideas for that proof. With Stirling’s approximation used to give error bounds, both approximations are tight when $k = \Theta(n)$,

$\binom{n}{k} = (1+o(1))2^{n H(k / n)}$

The experiment: Let the random variable $X$ denote the result of flipping $n$ fair coins i.e. $X$ is binomially distributed. Now suppose we know $X$ has exactly $k$ Heads.

The question: How much entropy is in the conditional distribution of $X$ given $k$ Heads? I.e. what is $H(X \mid k\text{ Heads})$?

The left side: The conditional distribution of $X$ is uniform over $\binom{n}{k}$ possible results. The entropy of the uniform distribution on $N$ objects is $\log N$, therefore

$H(X \mid k\text{ Heads}) = \log \binom{n}{k}$.

The right side: On the other hand, $X$ is specified by its bits $X_i$, so

$H(X \mid k \text{ Heads}) = H(X_1, X_2, \dots, X_n \mid k \text{ Heads})$

$H(X_1, X_2, \dots, X_n \mid k \text{ Heads}) \leq H(X_1 \mid k \text{ Heads}) + H(X_2 \mid k \text{ Heads}) + \cdots + H(X_n \mid k \text{ Heads})$

This expression is easy to compute: when flipping $k$ Heads of $n$ flips, each $X_i$ looks like a coin with bias $k /n$, therefore its entropy is $H(k/n)$. So

$H(X_1 \mid k \text{ Heads}) + H(X_2 \mid k \text{ Heads}) + \cdots + H(X_n \mid k \text{ Heads})= n H(k/n)$

giving the right-hand side.

## Further Generalization: Multinomial Coefficients

We can naturally generalize the above argument to provide an approximation for multinomial coefficients. Fix natural numbers $k_1, k_2, \dots, k_m$ with $\sum_i k_i = n$. Let $K$ be a distribution over $[m]$ taking $i$ with probability $k_i / n$. Then

$\log \binom{n}{k_1, k_2, \dots, k_m} \leq nH(K)$

The proof generalizes perfectly: the underlying set is $[m]^{n}$ (place a letter from $\{1, 2, \dots, m\}$ at each of $n$ indices), and we consider the entropy of a uniform choice of a string constrained to have the fixed letter histogram $k_1, k_2, \dots, k_m$. This is exactly the left-hand side, and subadditivity yields the right-hand side.

Moreover, by Stirling again the approximation is tight when $k_i = \Theta(n)$.

# Calculus-Free Asymptotic for Harmonic Numbers

Here we prove that $H_n = \Theta(\log_2 n)$ without touching calculus (the most common proof interprets $H_n$ as a Riemann sum of $\int 1/x$). Seeing as this is probably the most fundamental asymptotic for computer scientists, I was surprised to see this Stack Exchange question didn’t mention the calculus-free approach. The proof here pretty much matches my own answer to the Stack Exchange post.

We first consider powers of 2, $n = 2^k$. For these $n$, we can break up $\sum \frac{1}{j}$ into the “chunks”

$1$

$\frac{1}{2} + \frac{1}{3}$

$\frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \frac{1}{7}$

$\dots$

$\frac{1}{2^{k-1}} + \cdots + \frac{1}{2^k - 1}$

and we have one extra term $1/2^k$. There are $k + 1 = \log_2 n + 1$ chunks, and of course $H_n$ is the sum of these chunks (plus the extra term), hence to show $H_n = \Theta(\log_2 n)$ we show each chunk is $\Theta(1)$. Inside each chunk we bound above by the power of 2:

$1 \leq 1$

$\frac{1}{2} + \frac{1}{3} \leq \frac{1}{2} + \frac{1}{2} = 1$

$\frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \frac{1}{7} \leq \frac{1}{4} + \frac{1}{4} + \frac{1}{4} + \frac{1}{4} =1$

$\dots$

$\frac{1}{2^{k-1}} + \cdots + \frac{1}{2^k - 1} \leq \frac{1}{2^{k-1}} + \cdots + \frac{1}{2^{k - 1}} = 1$

Thus each chunk is at most 1, and taken together with the extra term $\frac{1}{2^k}$, we have $H_n \leq \log_2 n + \frac{1}{2^k} \leq \log_2 n + 1$.

On the other hand, if we lower bound the elements by the next power of 2,

$1 \geq 1$

$\frac{1}{2} + \frac{1}{3} \geq \frac{1}{4} + \frac{1}{4} = \frac{1}{2}$

$\frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \frac{1}{7} \geq \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} =\frac{1}{2}$

$\dots$

$\frac{1}{2^{k-1}} + \cdots + \frac{1}{2^k - 1} \geq \frac{1}{2^k} + \cdots + \frac{1}{2^k} = \frac{1}{2}$

Every chunk is at least $\frac{1}{2}$, hence $H_n \geq \frac{1}{2}\log_2 n$.

This essentially is the proof. One technicality: we’ve only shown $H_n = \Theta (\log_2 n)$ for $n$ a power of 2. Monotonicity of $H_n$ completes the proof for all $n$: taking the nearest powers of 2 above and below $n$, call them $n_-$ and $n_+$,

$n/2 < n_- < n < n_+ < 2n$

Applying our bounds on $H_{n_-}$ and $H_{n_+}$,

$H_n \leq H_{n_+} \leq \log_2 n_+ + 1 < \log_2 (2n) +1= \log_2 n + 2$

$H_n \geq H_{n_-} \geq \frac{1}{2}\log_2 n_- > \frac{1}{2}\log_2 (n/2) = \frac{1}{2}\log_2 n - \frac{1}{2}$

Thus $H_n = \Theta(\log_2 n)$.

(In general, for any monotonic linear or sublinear function, it suffices to show $\Theta(\cdot)$ on only the powers of 2).