This post recounts the story of one of my favorite “probabilistic method” proofs,

where is the binary entropy function i.e. the entropy of a coin flipped with bias . Maybe more properly the proof is just counting in two ways?

By a tiny tweak to our entropy argument as done here, one can get a better bound

This is what’s typically used in practice to bound e.g. the sum of the first of the binomial coefficients by . The proof here gives all the essential ideas for that proof. With Stirling’s approximation used to give error bounds, both approximations are tight when ,

**The experiment:** Let the random variable denote the result of flipping fair coins i.e. is binomially distributed. Now suppose we know has exactly Heads.

**The question:** How much entropy is in the conditional distribution of given Heads? I.e. what is ?

**The left side: **The conditional distribution of is uniform over possible results. The entropy of the uniform distribution on objects is , therefore

.

**The right side: **On the other hand, is specified by its bits , so

By subaddivitity of entropy,

This expression is easy to compute: when flipping Heads of flips, each looks like a coin with bias , therefore its entropy is . So

giving the right-hand side.

## Further Generalization: Multinomial Coefficients

We can naturally generalize the above argument to provide an approximation for multinomial coefficients. Fix natural numbers with . Let be a distribution over taking with probability . Then

The proof generalizes perfectly: the underlying set is (place a letter from at each of indices), and we consider the entropy of a uniform choice of a string constrained to have the fixed letter histogram . This is exactly the left-hand side, and subadditivity yields the right-hand side.

Moreover, by Stirling again the approximation is tight when .