A (Mostly) Picture Proof of Recurrence to the Origin on the Integer Lattice

This post presents the beautiful proof of the following result on random walks,

Theorem. Take a random walk on the integer lattice \mathbb{Z}^d starting at the origin. If d \leq 2, the probability you return to the origin at some point on the future is 1; if d \geq 3 the return probability is less than 1.

A random walk on \mathbb{Z}^d moves uniformly randomly along the gridlines of the integer lattice. For example, here’s a random walk on \mathbb{Z}^2:

all_walks

We want to prove that with probability 1 the walk will return to its start. Here’s the trick. Suppose instead of doing a random walk along the gridlines, we “make the coordinates independent”: at each time step we increment or decrement each coordinate indepently with probability 1/2. In this new walk we move uniformly randomly along red edges in the following picture:

better_walk

Notice that the red edges form a copy of the integer lattice! Therefore the probability that this new walk returns to the origin is exactly the same as the probability that the gridline walk returns to the origin.

The key property of this new walk is independence of the coordinates. The walk returns to the origin iff each coordinate is 0 simultaneously at some time t, and these events are independent so the probabilities simply multiply. Each coordinate looks like a one-dimensional random walk, so we have the equality

\text{Pr}[\text{walk on } \mathbb{Z}^2 \text{ returns to origin at time } t] = \text{Pr}[\text{walk on } \mathbb{Z} \text{ returns to origin at time } t]^2

We have a very nice fact, but how to apply it? Exercise: the expected number of returns to the origin is infinite iff the probability of eventually returning is 1. So let’s compute the expected number of returns. By linearity of expectation,

\mathbb{E}[\text{number of times } \mathbb{Z}^2 \text{ walk returns to origin}] = \displaystyle\sum_{t=1}^\infty \text{Pr}[\text{walk on } \mathbb{Z}^2 \text{ returns to origin at time } t]

= \displaystyle\sum_{t=1}^\infty \text{Pr}[\text{walk on } \mathbb{Z} \text{ returns to origin at time } t]^2

We’ve reduced the problem to analyzing the one-dimensional random walk, and this case we can compute exactly. We’re summing t independent +1‘s and -1‘s; the sum will be 0 if there are the same number of +1‘s and -1‘s, so

$latex \text{Pr}[\text{walk on } \mathbb{Z} \text{ returns to origin at time } t] = \frac{\binom{t}{t/2}}{2^t}$

This is the famous central binomial coefficient, and by Stirling’s approximation it’s \Theta( 1/ \sqrt{t}). Plugging this in, we arrive at a sum of reciprocals,

\mathbb{E}[\text{number of times } \mathbb{Z}^2 \text{ walk returns to origin}] = \displaystyle\sum_{t = 1}^\infty \Theta(1/t)

This is the harmonic series (up to a constant from the \Theta), which is of course infinite, and this completes the proof of the d=2 case.

For d dimensions, everything goes through. We get the equality

\mathbb{E}[\text{number of times } \mathbb{Z}^d \text{ walk returns to origin}] = \displaystyle\sum_{t = 1}^\infty \Theta(1/\sqrt{t})^d

This sum is convergent say by the integral test for any d \geq 3 and divergent for d = 1, 2, and this exactly captures the theorem.

This proof is also explained at this Stack Exchange post.

Advertisements

Proving the Coupling Inequality with LP Duality

This post details the first in a collection of everyday inequalities that arise as special cases of duality in linear programming. We’ll show that the usual coupling inequality is such an example,

d_{TV}(\mu, \nu) \leq \Pr[X \neq Y]

coupling of two distributions \mu and \nu on \Omega is a joint distribution on \Omega \times \Omega such that the two marginal distributions are \mu and \nu respectively. We write (X, Y) for the joint random variable, with X the marginal in the first coordinate, and Y the marginal in the second coordinate.  We say that \mu and \nu are “coupled”, with the coupling (X, Y) defining the interaction, or dependence, between them.

One of the basic facts about couplings compares the statistical distance between \mu and \nu to the probability the coupled variables agree. For any coupling (X, Y) of \mu and \nu, we have

d_{TV}(\mu, \nu) \leq \Pr[X \neq Y]

Furthermore, there is a coupling that achieves this bound. The above inequality is typically used to upper bound the statistical distance between \mu and \nu by constructing a coupling that’s likely to agree e.g. in the proof of convergence of an ergodic Markov chain to its limiting distribution.

In this post we’ll prove this inequality (and its tightness!) using an LP duality argument.

The Coupling LP

A coupling is specified by a set of linear constraints: for x,y \in \Omega, let p_{x,y} be the probability of seeing the pair (x,y). The constraints are

\forall x. \quad \displaystyle\sum_{y \in \Omega} p_{x,y} = \mu(x)

\forall y. \quad \displaystyle\sum_{x \in \Omega} p_{x,y} = \nu(y)

\forall x,y. \quad p_{x,y} \geq 0

\displaystyle\sum_{x,y \in \Omega} p_{x,y} = 1

This defines a polytope in \mathbb{R}^{|\Omega|^2}, called a transportation polytope. In fact, the last constraint is redundant, as summing the constraints with \mu(x) over all x‘s will also give 1. Over this polytope, the linear constraint to minimize is the probability X and Y disagree,

\min\displaystyle\sum_{\substack{x,y \in \Omega \\ x \neq y}} p_{x, y}

Ok, now that we have the LP, its dual (whose computation is explained e.g. here… or perhaps in a future blog post) is

\max \displaystyle\sum_{\omega \in \Omega} q^\mu_\omega \mu(\omega) + q^\nu_\omega \nu(\omega)

\forall \omega. \quad q^\mu_\omega + q^\nu_\omega \leq 0

\forall \omega_1 \neq \omega_2. \quad q^\mu_\omega + q^\nu_\omega \leq 1

By the duality theorems, \Pr[X \neq Y] is always at least as large as the solution to the dual, and furthermore, for some X, Y the two solutions agree. To finish our argument, we’ll show that the value of the dual LP is the statistical distance between \mu and \nu. The definition of the statistical distance we’ll use is the “optimization” version: the maximum difference in probabilities of events in \Omega,

d_{TV}(\mu, \nu) = \max_{A \subseteq \Omega} |\mu (A) - \nu (A)|

= \max_{A \subseteq \Omega} \left|\displaystyle\sum_{a \in A} \mu(a) - \nu(a)\right|

First we argue that the optimum is \{-1, 0, +1\}-valued. The dual problem’s matrix is totally unimodular (Prop. 3.2) and the right-hand sides of the constraints are integral, hence the LP is integral. Fix an optimal solution q^{\mu*}_\omega. From the constraints we see only one of the sets \{q^{\mu*}_\omega : \omega \in \Omega\} and \{q^{\nu*}_\omega : \omega \in \Omega\} can contain positive values. WLOG these are the variables for \mu. By optimality and looking at the largest (resp. smallest) of the q^{\mu*}_\omega (resp. q^{\nu*}_\omega), the \{q^{\mu*}_\omega : \omega \in \Omega \} are within 1 of one another, and the q^{\nu*}_\omega are their negations. Now we can shift the solution to be \{-1, 0, +1\}-valued without changing the value.

Now we are in a position to compare the \{-1, 0, +1\}-optimum with d_{TV}(\mu, \nu). As noted above, the set of variables for exactly one of \mu, \nu is positive, which gives the sign in the absolute value. The +1-valued variables indicate the event to take. This shows the value of the dual is exactly d_{TV}(\mu, \nu). As explained, this finishes the proof that

d_{TV}(\mu, \nu) \leq \Pr[X \neq Y]

 

Transportation Polytopes

We finish with a few extra details on transportation polytopes.

Fixing in mind \mu and \nu, the collection of possible couplings Z forms a convex polytope in \mathbb{R}^{|\Omega|^2} from a very special class of polytopes, the transportation polytopes. The name comes from the following “transportation” problem in optimization: suppose you have m factories producing a good, such as lawn mowers, which produce quantities s_1, s_2, \dots, s_m. You want to distribute the lawn mowers among n stores in quantities t_1, t_2, \dots, t_n. The question is, how many lawn mowers should factory i send to store j? A solution that achieves the desired stocking quantity is exactly a matrix (a.k.a joint probability distribution) with the given marginals.

Vertices of a transportation polytope can be enumerated and checked algorithmically. Furthermore, when the marginals \mu and \nu are uniform, the corresponding transportation polytope is exactly the set of doubly stochastic matrices. By the classical Birkhoff-Von Neumann theorem, the vertices of this polytope are exactly the permutation matrices i.e. doubly stochastic matrices are exactly convex combinations of permutation matrices (along with the zero matrix).