Article · Wikipedia archive · Last revised Jun 11, 2026

Polar factorization theorem

In optimal transport, a branch of mathematics, polar factorization of vector fields is a basic result due to Brenier (1987), with antecedents of Knott-Smith (1984) and Rachev (1985), that generalizes many existing results among which are the polar decomposition of real matrices, and the rearrangement of real-valued functions.

Last revised
Jun 11, 2026
Read time
≈ 6 min
Length
1,333 w
Citations
8
Source

In optimal transport, a branch of mathematics, polar factorization of vector fields is a basic result due to Brenier (1987),1 with antecedents of Knott-Smith (1984)2 and Rachev (1985),3 that generalizes many existing results among which are the polar decomposition of real matrices, and the rearrangement of real-valued functions.

The theorem

Notation. Denote ξ # μ {\displaystyle \xi _{\#}\mu } the image measure of μ {\displaystyle \mu } through the map ξ {\displaystyle \xi } .

Definition: Measure preserving map. Let ( X , μ ) {\displaystyle (X,\mu )} and ( Y , ν ) {\displaystyle (Y,\nu )} be some probability spaces and σ : X Y {\displaystyle \sigma :X\rightarrow Y} a measurable map. Then, σ {\displaystyle \sigma } is said to be measure preserving iff σ # μ = ν {\displaystyle \sigma _{\#}\mu =\nu } , where # {\displaystyle \#} is the pushforward measure. Spelled out: for every ν {\displaystyle \nu } -measurable subset Ω {\displaystyle \Omega } of Y {\displaystyle Y} , σ 1 ( Ω ) {\displaystyle \sigma ^{-1}(\Omega )} is μ {\displaystyle \mu } -measurable, and μ ( σ 1 ( Ω ) ) = ν ( Ω ) {\displaystyle \mu (\sigma ^{-1}(\Omega ))=\nu (\Omega )} . The latter is equivalent to:

X ( f σ ) ( x ) μ ( d x ) = X ( σ f ) ( x ) μ ( d x ) = Y f ( y ) ( σ # μ ) ( d y ) = Y f ( y ) ν ( d y ) {\displaystyle \int _{X}(f\circ \sigma )(x)\mu (dx)=\int _{X}(\sigma ^{*}f)(x)\mu (dx)=\int _{Y}f(y)(\sigma _{\#}\mu )(dy)=\int _{Y}f(y)\nu (dy)}

where f {\displaystyle f} is ν {\displaystyle \nu } -integrable and f σ {\displaystyle f\circ \sigma } is μ {\displaystyle \mu } -integrable.

Theorem. Consider a map ξ : Ω R d {\displaystyle \xi :\Omega \rightarrow R^{d}} where Ω {\displaystyle \Omega } is a convex subset of R d {\displaystyle R^{d}} , and μ {\displaystyle \mu } a measure on Ω {\displaystyle \Omega } which is absolutely continuous. Assume that ξ # μ {\displaystyle \xi _{\#}\mu } is absolutely continuous. Then there is a convex function φ : Ω R {\displaystyle \varphi :\Omega \rightarrow R} and a map σ : Ω Ω {\displaystyle \sigma :\Omega \rightarrow \Omega } preserving μ {\displaystyle \mu } such that

ξ = ( φ ) σ {\displaystyle \xi =\left(\nabla \varphi \right)\circ \sigma }

In addition, φ {\displaystyle \nabla \varphi } and σ {\displaystyle \sigma } are uniquely defined almost everywhere.14

Applications and connections

Dimension 1

In dimension 1, and when μ {\displaystyle \mu } is the Lebesgue measure over the unit interval, the result specializes to Ryff's theorem.5 When d = 1 {\displaystyle d=1} and μ {\displaystyle \mu } is the uniform distribution over [ 0 , 1 ] {\displaystyle \left[0,1\right]} , the polar decomposition boils down to

ξ ( t ) = F X 1 ( σ ( t ) ) {\displaystyle \xi \left(t\right)=F_{X}^{-1}\left(\sigma \left(t\right)\right)}

where F X {\displaystyle F_{X}} is cumulative distribution function of the random variable ξ ( U ) {\displaystyle \xi \left(U\right)} and U {\displaystyle U} has a uniform distribution over [ 0 , 1 ] {\displaystyle \left[0,1\right]} . F X {\displaystyle F_{X}} is assumed to be continuous, and σ ( t ) = F X ( ξ ( t ) ) {\displaystyle \sigma \left(t\right)=F_{X}\left(\xi \left(t\right)\right)} preserves the Lebesgue measure on [ 0 , 1 ] {\displaystyle \left[0,1\right]} .

Polar decomposition of matrices

When ξ {\displaystyle \xi } is a linear map and μ {\displaystyle \mu } is the Gaussian normal distribution, the result coincides with the polar decomposition of matrices. Assuming ξ ( x ) = M x {\displaystyle \xi \left(x\right)=Mx} where M {\displaystyle M} is an invertible d × d {\displaystyle d\times d} matrix and considering μ {\displaystyle \mu } the N ( 0 , I d ) {\displaystyle {\mathcal {N}}\left(0,I_{d}\right)} probability measure, the polar decomposition boils down to

M = S O {\displaystyle M=SO}

where S {\displaystyle S} is a symmetric positive definite matrix, and O {\displaystyle O} an orthogonal matrix. The connection with the polar factorization is φ ( x ) = x S x / 2 {\displaystyle \varphi \left(x\right)=x^{\top }Sx/2} which is convex, and σ ( x ) = O x {\displaystyle \sigma \left(x\right)=Ox} which preserves the N ( 0 , I d ) {\displaystyle {\mathcal {N}}\left(0,I_{d}\right)} measure.

Helmholtz decomposition

The results also allow to recover Helmholtz decomposition. Letting x V ( x ) {\displaystyle x\rightarrow V\left(x\right)} be a smooth vector field it can then be written in a unique way as

V = w + p {\displaystyle V=w+\nabla p}

where p {\displaystyle p} is a smooth real function defined on Ω {\displaystyle \Omega } , unique up to an additive constant, and w {\displaystyle w} is a smooth divergence free vector field, parallel to the boundary of Ω {\displaystyle \Omega } .

The connection can be seen by assuming μ {\displaystyle \mu } is the Lebesgue measure on a compact set Ω R n {\displaystyle \Omega \subset R^{n}} and by writing ξ {\displaystyle \xi } as a perturbation of the identity map

ξ ϵ ( x ) = x + ϵ V ( x ) {\displaystyle \xi _{\epsilon }(x)=x+\epsilon V(x)}

where ϵ {\displaystyle \epsilon } is small. The polar decomposition of ξ ϵ {\displaystyle \xi _{\epsilon }} is given by ξ ϵ = ( φ ϵ ) σ ϵ {\displaystyle \xi _{\epsilon }=(\nabla \varphi _{\epsilon })\circ \sigma _{\epsilon }} . Then, for any test function f : R n R {\displaystyle f:R^{n}\rightarrow R} the following holds:

Ω f ( x + ϵ V ( x ) ) d x = Ω f ( ( φ ϵ ) σ ϵ ( x ) ) d x = Ω f ( φ ϵ ( x ) ) d x {\displaystyle \int _{\Omega }f(x+\epsilon V(x))dx=\int _{\Omega }f((\nabla \varphi _{\epsilon })\circ \sigma _{\epsilon }\left(x\right))dx=\int _{\Omega }f(\nabla \varphi _{\epsilon }\left(x\right))dx}

where the fact that σ ϵ {\displaystyle \sigma _{\epsilon }} was preserving the Lebesgue measure was used in the second equality.

In fact, as φ 0 ( x ) = 1 2 x 2 {\displaystyle \textstyle \varphi _{0}(x)={\frac {1}{2}}\Vert x\Vert ^{2}} , one can expand φ ϵ ( x ) = 1 2 x 2 + ϵ p ( x ) + O ( ϵ 2 ) {\displaystyle \textstyle \varphi _{\epsilon }(x)={\frac {1}{2}}\Vert x\Vert ^{2}+\epsilon p(x)+O(\epsilon ^{2})} , and therefore φ ϵ ( x ) = x + ϵ p ( x ) + O ( ϵ 2 ) {\displaystyle \textstyle \nabla \varphi _{\epsilon }\left(x\right)=x+\epsilon \nabla p(x)+O(\epsilon ^{2})} . As a result, Ω ( V ( x ) p ( x ) ) f ( x ) ) d x {\displaystyle \textstyle \int _{\Omega }\left(V(x)-\nabla p(x)\right)\nabla f(x))dx} for any smooth function f {\displaystyle f} , which implies that w ( x ) = V ( x ) p ( x ) {\displaystyle w\left(x\right)=V(x)-\nabla p(x)} is divergence-free.16

See also

See also

References

References

  1. Brenier, Yann (1991). "Polar factorization and monotone rearrangement of vector‐valued functions" (PDF). Communications on Pure and Applied Mathematics. 44 (4): 375–417. doi:10.1002/cpa.3160440402. Retrieved 16 April 2021.
  2. Knott, M.; Smith, C. S. (1984). "On the optimal mapping of distributions". Journal of Optimization Theory and Applications. 43: 39–49. doi:10.1007/BF00934745. S2CID 120208956. Retrieved 16 April 2021.
  3. Rachev, Svetlozar T. (1985). "The Monge–Kantorovich mass transference problem and its stochastic applications" (PDF). Theory of Probability & Its Applications. 29 (4): 647–676. doi:10.1137/1129093. Retrieved 16 April 2021.
  4. Santambrogio, Filippo (2015). Optimal transport for applied mathematicians. New York: Birkäuser. CiteSeerX 10.1.1.726.35.
  5. Ryff, John V. (1965). "Orbits of L1-Functions Under Doubly Stochastic Transformation". Transactions of the American Mathematical Society. 117: 92–100. doi:10.2307/1994198. JSTOR 1994198. Retrieved 16 April 2021.
  6. Villani, Cédric (2003). Topics in optimal transportation. American Mathematical Society.