Probably the most studied problem in probability theory is the following: suppose are independent, identically distributed random variables from some distribution on , and we look at the random variable

How is distributed?

Typically the answer is the central limit theorem (CLT) amplified with the Berry-Esséen theorem, “a sum of independent variables is approximately normally distributed”. This post explains two symbolic translations of that sentence which I find easier to write down at a moment’s notice. The first takes the viewpoint of “asymptotic approximation of “,

Here is the mean of and is the standard deviation of . The second is

where the are i.i.d Gaussian random variables with the same mean and variance as the , . This can be thought of as an “invariance principle” or “replacement lemma” (and this is the viewpoint taken by Lindeburg in his proof of the CLT). The invariance principle is now a tool used in Boolean fourier analysis.

**Crucial remark:** Unfortunately, the equality of these “syntactic forms” is NOT any convergence in distribution. We’ve “blown up” errors by a factor of from the convergence guaranteed by the CLT. Dividing both sides by normalizes both sides to have finite variance and gives convergence in probability (or the stronger convergence guaranteed by an appropriate CLT).

## Asymptotics for

The law of large numbers says that, if is finite, “the sample mean converges to the true mean”, i.e.

In our notation, this is a “linear approximation to “,

The central limit theorem refines this by saying that, in fact, the sample mean is about away from the true mean. If and are finite,

In our notation, this is a “ approximation to “,

The Berry-Esséen theorem strengthens the convergence to the normal distribution. If denotes (an upper bound on) the third moment of the distribution , which we assume to be finite,

In our notation, this is an improvement up to a constant to the asymptotic distribution of ,

It should be noted that the Berry-Esséen theorem is tight up to the big-Oh. The binomial distribution achieves this rate of convergence; see the end of these lecture notes for a proof.

## Invariance Principles

The previous analysis ends with the distribution . We can incorporate everything into one Gaussian to produce the equivalent distribution . Interestingly, because the sum of independent Gaussians is again Gaussian, this distribution has the same PDF as

for independent Gaussian random variables . This leads us to the equality

in which we’ve simply replaced each by a Gaussian random variable with the same mean and variance.

As noted in Remark 29 here, one can improve the constant term by changing the replacement variables. In particular, it can be made if the first, second, **and third** moments of the agree with the .

The idea of “replacement invariance” surfaces in theoretical CS in the context of Boolean fourier analysis. Here we generalize the summation of Boolean (-valued) variables to an arbitrary Boolean function

The invariance principle states that the random variable for uniformly drawn from is close in probability to the random variable

for (assuming we normalize both sides to have variance ). In this case the “closeness” is determined by the maximum influence of a variable on the value of , as well as the complexity of (its degree as a multilinear polynomial); see the previously linked lecture notes for an exact quantitative statement.