This post details the first in a collection of everyday inequalities that arise as special cases of duality in linear programming. We’ll show that the usual coupling inequality is such an example,
A coupling of two distributions and on is a joint distribution on such that the two marginal distributions are and respectively. We write for the joint random variable, with the marginal in the first coordinate, and the marginal in the second coordinate. We say that and are “coupled”, with the coupling defining the interaction, or dependence, between them.
One of the basic facts about couplings compares the statistical distance between and to the probability the coupled variables agree. For any coupling of and , we have
Furthermore, there is a coupling that achieves this bound. The above inequality is typically used to upper bound the statistical distance between and by constructing a coupling that’s likely to agree e.g. in the proof of convergence of an ergodic Markov chain to its limiting distribution.
In this post we’ll prove this inequality (and its tightness!) using an LP duality argument.
The Coupling LP
A coupling is specified by a set of linear constraints: for , let be the probability of seeing the pair . The constraints are
This defines a polytope in , called a transportation polytope. In fact, the last constraint is redundant, as summing the constraints with over all ‘s will also give . Over this polytope, the linear constraint to minimize is the probability and disagree,
Ok, now that we have the LP, its dual (whose computation is explained e.g. here… or perhaps in a future blog post) is
By the duality theorems, is always at least as large as the solution to the dual, and furthermore, for some the two solutions agree. To finish our argument, we’ll show that the value of the dual LP is the statistical distance between and . The definition of the statistical distance we’ll use is the “optimization” version: the maximum difference in probabilities of events in ,
First we argue that the optimum is -valued. The dual problem’s matrix is totally unimodular (Prop. 3.2) and the right-hand sides of the constraints are integral, hence the LP is integral. Fix an optimal solution . From the constraints we see only one of the sets and can contain positive values. WLOG these are the variables for . By optimality and looking at the largest (resp. smallest) of the (resp. ), the are within of one another, and the are their negations. Now we can shift the solution to be -valued without changing the value.
Now we are in a position to compare the -optimum with . As noted above, the set of variables for exactly one of is positive, which gives the sign in the absolute value. The -valued variables indicate the event to take. This shows the value of the dual is exactly . As explained, this finishes the proof that
We finish with a few extra details on transportation polytopes.
Fixing in mind and , the collection of possible couplings forms a convex polytope in from a very special class of polytopes, the transportation polytopes. The name comes from the following “transportation” problem in optimization: suppose you have factories producing a good, such as lawn mowers, which produce quantities . You want to distribute the lawn mowers among stores in quantities . The question is, how many lawn mowers should factory send to store ? A solution that achieves the desired stocking quantity is exactly a matrix (a.k.a joint probability distribution) with the given marginals.
Vertices of a transportation polytope can be enumerated and checked algorithmically. Furthermore, when the marginals and are uniform, the corresponding transportation polytope is exactly the set of doubly stochastic matrices. By the classical Birkhoff-Von Neumann theorem, the vertices of this polytope are exactly the permutation matrices i.e. doubly stochastic matrices are exactly convex combinations of permutation matrices (along with the zero matrix).