Applications of Optimal Transport

Schematic of the thermal slope effect of ice sheets — Optimal Transport Theory and Wasserstein Distance

Introduction to Optimal Transport and Wasserstein Distance

Optimal Transport (OT) theory focuses on how to transport “mass” (or probability distributions) from one to another with minimal cost. Starting from the practical problem of “how to minimize the cost of transporting soil to build fortifications”, Monge (1781) first discussed and studied optimal transport theory. In the 1940s, Kantorovich reformulated this problem and described the mass transport between different marginal distributions through a joint distribution (coupling). Wasserstein Distance is a “geometric distance” between distributions defined under the optimal transport framework, which can quantitatively measure the difference and similarity between two probability distributions. Compared with indicators describing the relationship between distributions such as Kullback–Leibler divergence, W distance is a strict metric that satisfies metric axioms (non-negativity, identity, symmetry, triangle inequality), so it is more stable and has more geometric interpretability when comparing distributions.

Physical Intuition of Wasserstein Distance

Suppose we want to transport and reshape a pile of soil $\mu$ into another shape $\nu$:

Principle of Least Action: We hope that the sum of the product of the transported mass and the distance is as small as possible. The total work done in this process corresponds to the Wasserstein Distance.
Conservation of Mass: The total amount of sand remains consistent before and after transportation. Therefore, any function distribution that satisfies the conservation of mass (such as probability distribution, normalized density field) can be used as a research object.
Transport Plan: The detailed plan $M$ of how to transport each piece of mass is mathematically called the Optimal Transport Plan or Optimal Joint Probability Distribution.

From this example, we can see some applications of Wasserstein distance:

Quantifying Distribution Shape Differences: The greater the difference between the piles, the more work is required, so the Wasserstein distance can describe the degree of difference between the distributions of two piles.
Path Planning and Joint Probability Distribution: The optimal transport scheme accompanying the Wasserstein distance can help us find the optimal transportation scheme (optimal joint probability distribution) to achieve minimized transportation costs (soil pile distribution differences).
Dynamics Applications: In addition, in fluid mechanics, Wasserstein distance is deeply coupled with gradient flow theory and is used to describe the optimal manifold mapping problem in fluid motion.

Calculation of Wasserstein Distance

Calculation:
- For two distributions μ and ν, the Wasserstein distance is defined as:
  $W(μ,ν) = inf_{\pi ∈ \Pi(μ,ν)}\int_{R_d \times R_d}c(x,y)d\pi(x,y)$
  Where $c(x,y)$ is the distance from point x on μ to point y on ν, and $\pi(x,y)$ is the joint probability distribution of μ and ν.
- For 1D Wasserstein distance, the calculation can be implemented through the quantile function
  $W_p = (\int_0^1 |F_μ^{-1} - F_ν^{-1}|^p du)^{\frac{1}{p}}$
  Where $F_μ^{-1}$ and $F_ν^{-1}$ are the quantile functions of μ and ν, respectively. When p=1, it is also known as “Earth Mover’s Distance”.
Python Libraries (Practical):
- SciPy: scipy.stats.wasserstein_distance can directly calculate the $W_1$ of 1D empirical distributions.
- Python Optimal Transport (POT): ot.emd2 can calculate EMD (requires a cost matrix composed of L1 distances); POT is one of the efficient exact OT solvers, but it is not suitable for “large sample size” OT problems.

Some Applications of Wasserstein Distance in Climate Science

References

Figalli, A., & Glaudo, F. (2021). An invitation to optimal transport, Wasserstein distances, and gradient flows.