This post recounts the story of one of my favorite “probabilistic method” proofs,
where is the binary entropy function i.e. the entropy of a coin flipped with bias . Maybe more properly the proof is just counting in two ways?
By a tiny tweak to our entropy argument as done here, one can get a better bound
This is what’s typically used in practice to bound e.g. the sum of the first of the binomial coefficients by . The proof here gives all the essential ideas for that proof. With Stirling’s approximation used to give error bounds, both approximations are tight when ,
The experiment: Let the random variable denote the result of flipping fair coins i.e. is binomially distributed. Now suppose we know has exactly Heads.
The question: How much entropy is in the conditional distribution of given Heads? I.e. what is ?
The left side: The conditional distribution of is uniform over possible results. The entropy of the uniform distribution on objects is , therefore
The right side: On the other hand, is specified by its bits , so
By subaddivitity of entropy,
This expression is easy to compute: when flipping Heads of flips, each looks like a coin with bias , therefore its entropy is . So
giving the right-hand side.
Further Generalization: Multinomial Coefficients
We can naturally generalize the above argument to provide an approximation for multinomial coefficients. Fix natural numbers with . Let be a distribution over taking with probability . Then
The proof generalizes perfectly: the underlying set is (place a letter from at each of indices), and we consider the entropy of a uniform choice of a string constrained to have the fixed letter histogram . This is exactly the left-hand side, and subadditivity yields the right-hand side.
Moreover, by Stirling again the approximation is tight when .