Exponential tilting

Exponential tilting (ET), exponential twisting, or exponential change of measure (ECM) is a distribution-shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable $X$ is known as the natural exponential family of $X$ .

Exponential tilting is used in Monte Carlo estimation for rare-event simulation, and rejection and importance sampling in particular. In mathematical finance ¹ Exponential tilting is also known as Esscher tilting (or the Esscher transform), and often combined with indirect Edgeworth approximation and is used in such contexts as insurance futures pricing.²

The earliest formalization of exponential tilting is often attributed to Frederik Esscher³ with its use in importance sampling being attributed to David Siegmund.⁴

Definition

Let $X$ be a real-valued random variable on a probability space $(\Omega ,{\mathcal {F}},P)$ . Suppose that the moment-generating function

M_{X}(\theta )=\operatorname {E} [e^{\theta X}]

is finite for a given real parameter $\theta$ . Let

\kappa (\theta )=\log M_{X}(\theta )

be the cumulant-generating function (CGF). The exponentially tilted measure $P_{\theta }$ , restricted to events determined by $X$ , is defined by

P_{\theta }(X\in B)=\int _{B}e^{\theta x-\kappa (\theta )}\,P_{X}(dx),

where $P_{X}$ is the law of $X$ . Equivalently,

{\frac {dP_{\theta }}{dP}}=e^{\theta X-\kappa (\theta )}

on the sigma algebra generated by $X$ .

If $X$ has density $f$ , then the tilted density is

f_{\theta }(x)=e^{\theta x-\kappa (\theta )}f(x).

Thus $f_{\theta }(x)$ is proportional to $e^{\theta x}f(x)$ , with the normalizing constant supplied by $M_{X}(\theta )$ .

For a random vector $X\in \mathbb {R} ^{d}$ and a vector parameter $\theta \in \mathbb {R} ^{d}$ , the analogous definition is

P_{\theta }(X\in B)=\int _{B}e^{\theta ^{T}x-\kappa (\theta )}\,P_{X}(dx),

where

\kappa (\theta )=\log \operatorname {E} [e^{\theta ^{T}X}].

Example

The exponentially tilted measure in many cases has the same parametric form as that of $X$ . One-dimensional examples include the normal distribution, the exponential distribution, the binomial distribution and the Poisson distribution.

For example, in the case of the normal distribution, $N(\mu ,\sigma ^{2})$ the tilted density $f_{\theta }(x)$ is the $N(\mu +\theta \sigma ^{2},\sigma ^{2})$ density. The table below provides more examples of tilted densities.

Original distribution⁵⁶	θ-tilted distribution
$\mathrm {Gamma} (\alpha ,\beta )$	$\mathrm {Gamma} (\alpha ,\beta -\theta )$
$\mathrm {Binomial} (n,p)$	$\mathrm {Binomial} \left(n,{\frac {pe^{\theta }}{1-p+pe^{\theta }}}\right)$
$\mathrm {Poisson} (\lambda )$	$\mathrm {Poisson} (\lambda e^{\theta })$
$\mathrm {Exponential} (\lambda )$	$\mathrm {Exponential} (\lambda -\theta )$
${\mathcal {N}}(\mu ,\sigma ^{2})$	${\mathcal {N}}(\mu +\theta \sigma ^{2},\sigma ^{2})$
${\mathcal {N}}(\mu ,\Sigma )$	${\mathcal {N}}(\mu +\Sigma \theta ,\Sigma )$
$\chi ^{2}(\kappa )$	$\mathrm {Gamma} \left({\frac {\kappa }{2}},{\frac {2}{1-2\theta }}\right)$

Not every tilted law remains in the same familiar parametric family.⁷ For example, if $X$ has the Lomax (or Pareto type II) density

f(x)=\alpha (1+x)^{-\alpha -1},\qquad x>0,

then the tilted density is proportional to $e^{\theta x}(1+x)^{-\alpha -1}$ . It is normalizable for $\theta <0$ , but it is not generally another Lomax distribution.

In statistical mechanics, the energy of a system in equilibrium with a heat bath has the Boltzmann distribution: $\mathbb {P} (E\in dE)\propto e^{-\beta E}\,dE$ , where $\beta$ is the inverse temperature. Exponential tilting then corresponds to changing the temperature: $\mathbb {P} _{\theta }(E\in dE)\propto e^{-(\beta -\theta )E}\,dE$ .

Similarly, the energy and particle number of a system in equilibrium with a heat and particle bath has the grand canonical distribution: $\mathbb {P} ((N,E)\in (dN,dE))\propto e^{\beta \mu N-\beta E}\,dN\,dE$ , where $\mu$ is the chemical potential. Exponential tilting then corresponds to changing both the temperature and the chemical potential.

Advantages

In many cases, the tilted distribution belongs to the same parametric family as the original. This is particularly true when the original density belongs to the exponential family of distributions. This simplifies random variable generation during Monte-Carlo simulations. Exponential tilting may still be useful if this is not the case, though normalization must be possible and additional sampling algorithms may be needed.

In addition, there exists a simple relationship between the original and tilted CGF,

\kappa _{\theta }(\eta )=\log(\mathbb {E} _{\theta }[e^{\eta X}])=\kappa (\theta +\eta )-\kappa (\theta ).

We can see this by observing that

F_{\theta }(x)=\int \limits _{\infty }^{x}\exp\{\theta y-\kappa (\theta )\}f(y)\,dy.

Thus,

{\begin{aligned}\kappa _{\theta }(\eta )&=\log \int e^{\eta x}\,dF_{\theta }(x)\\&=\log \int e^{\eta x}e^{\theta x-\kappa (\theta )}\,dF(x)\\&=\log \mathbb {E} [e^{(\eta +\theta )X-\kappa (\theta )}]\\&=\log(e^{\kappa (\eta +\theta )-\kappa (\theta )})\\&=\kappa (\eta +\theta )-\kappa (\theta ).\end{aligned}}

Clearly, this relationship allows for easy calculation of the CGF of the tilted distribution and thus the distributions moments. Moreover, it results in a simple form of the likelihood ratio. Specifically,

\ell ={\frac {d\mathbb {P} }{d\mathbb {P} _{\theta }}}={\frac {f(x)}{f_{\theta }(x)}}=e^{-\theta x+\kappa (\theta )}.

Properties

If $\kappa (\eta )=\log \mathrm {E} [\exp(\eta X)]$ is the CGF of $X$ , then the CGF of the $\theta$ -tilted $X$ is

\kappa _{\theta }(\eta )=\kappa (\theta +\eta )-\kappa (\theta ).

This means that the

i

-th cumulant of the tilted

X

is

\kappa ^{(i)}(\theta )

. In particular, the expectation of the tilted distribution is

\mathrm {E} _{\theta }[X]={\tfrac {d}{d\eta }}\kappa _{\theta }(\eta )|_{\eta =0}=\kappa '(\theta )

.

The variance of the tilted distribution is

\operatorname {Var} _{\theta }[X]={\tfrac {d^{2}}{d\eta ^{2}}}\kappa _{\theta }(\eta )|_{\eta =0}=\kappa ''(\theta )

.

Repeated tilting is additive. That is, tilting first by $\theta _{1}$ and then $\theta _{2}$ is the same as tilting once by $\theta _{1}+\theta _{2}$ .

If $X$ is the sum of independent, but not necessarily identical random variables $X_{1},X_{2},\dots$ , then the $\theta$ -tilted distribution of $X$ is the sum of $X_{1},X_{2},\dots$ each $\theta$ -tilted individually.

If $\mu =\mathrm {E} [X]$ , then $\kappa (\theta )-\theta \mu$ is the Kullback–Leibler divergence

D_{\text{KL}}(P\parallel P_{\theta })=\mathrm {E} \left[\log {\tfrac {P}{P_{\theta }}}\right]

between the tilted distribution

P_{\theta }

and the original distribution

P

of

X

.

Similarly, since $\mathrm {E} _{\theta }[X]=\kappa '(\theta )$ , we have the Kullback-Leibler divergence as

D_{\text{KL}}(P_{\theta }\parallel P)=\mathrm {E} _{\theta }\left[\log {\tfrac {P_{\theta }}{P}}\right]=\theta \kappa '(\theta )-\kappa (\theta )

.

Applications

Rare-event simulation

The exponential tilting of $X$ , assuming it exists, supplies a family of distributions that can be used as proposal distributions for acceptance-rejection sampling or importance distributions for importance sampling. One common application is sampling from a distribution conditional on a sub-region of the domain, i.e. $X\mid X\in A$ . With an appropriate choice of $\theta$ , sampling from $\mathbb {P} _{\theta }$ can meaningfully reduce the required amount of sampling or the variance of an estimator.

Saddlepoint approximation

The saddlepoint approximation method is a density approximation methodology often used for the distribution of sums and averages of independent, identically distributed random variables that employs Edgeworth series, but which generally performs better at extreme values. From the definition of the natural exponential family, it follows that

f_{\theta }({\bar {x}})=f({\bar {x}})\exp\{n(\theta {\bar {x}}-\kappa (\theta ))\}

.

Applying the Edgeworth expansion for $f_{\theta }({\bar {x}})$ , we have

f_{\theta }({\bar {x}})=\psi (z)(\mathrm {Var} [{\bar {X}}])^{-1/2}\left\{1+{\frac {\rho _{3}(\theta )h_{3}(z)}{6}}+{\frac {\rho _{4}(\theta )h_{4}(z)}{24}}+\cdots \right\},

where $\psi (z)$ is the standard normal density of

{\begin{aligned}&z={\frac {{\bar {x}}-\kappa _{\bar {x}}'(\theta )}{\kappa _{\bar {x}}''(\theta )}},\\[8pt]&\rho _{n}(\theta )=\kappa ^{(n)}(\theta )\{\kappa ''(\theta )^{n/2}\},\end{aligned}}

and $h_{n}$ are the Hermite polynomials.

When considering values of ${\bar {x}}$ progressively farther from the center of the distribution, $|z|\rightarrow \infty$ and the $h_{n}(z)$ terms become unbounded. However, for each value of ${\bar {x}}$ , we can choose $\theta$ such that

\kappa '(\theta )={\bar {x}}.

This value of $\theta$ is referred to as the saddle-point, and the above expansion is always evaluated at the expectation of the tilted distribution. This choice of $\theta$ leads to the final representation of the approximation given by

f({\bar {x}})\approx \left({\frac {n}{2\pi \kappa ''(\theta )}}\right)^{1/2}\exp\{n(\kappa (\theta )-\theta {\bar {x}})\}.

⁸⁹

Rejection sampling

Using the tilted distribution $\mathbb {P} _{\theta }$ as the proposal, the rejection sampling algorithm prescribes sampling from $f_{\theta }(x)$ and accepting with probability

{\frac {1}{c}}\exp(-\theta x+\kappa (\theta )),

where

c=\sup \limits _{x\in X}{\frac {d\mathbb {P} }{d\mathbb {P} _{\theta }}}(x).

That is, a uniformly distributed random variable $p\sim {\mbox{Unif}}(0,1)$ is generated, and the sample from $f_{\theta }(x)$ is accepted if

p\leq {\frac {1}{c}}\exp(-\theta x+\kappa (\theta )).

Importance sampling

Applying the exponentially tilted distribution as the importance distribution yields the equation

\mathbb {E} (h(X))=\mathbb {E} _{\theta }[\ell (X)h(X)],

where

\ell (X)={\frac {d\mathbb {P} }{d\mathbb {P} _{\theta }}}

is the likelihood function. So, one samples from $f_{\theta }$ to estimate the probability under the importance distribution $\mathbb {P} (dX)$ and then multiplies it by the likelihood ratio. Moreover, we have the variance given by

\operatorname {Var} (X)=\mathbb {E} [(\ell (X)h(X)^{2}].

Example

Assume independent and identically distributed $\{X_{i}\}$ such that $\kappa (\theta )<\infty$ . In order to estimate $\mathbb {P} (X_{1}+\cdots +X_{n}>c)$ , we can employ importance sampling by taking

h(X)=\mathbb {I} \left(\sum _{i=1}^{n}X_{i}>c\right).

The constant $c$ can be rewritten as $na$ for some other constant $a$ . Then,

\mathbb {P} (\sum _{i=1}^{n}X_{i}>na)=\mathbb {E} _{\theta _{a}}\left[\exp\{-\theta _{a}\sum _{i=1}^{n}X_{i}+n\kappa (\theta _{a})\}\mathbb {I} (\sum _{i=1}^{n}X_{i}>na)\right],

where $\theta _{a}$ denotes the $\theta$ defined by the saddle-point equation

\kappa '(\theta _{a})=a.

Stochastic processes and Girsanov's theorem

For stochastic processes, the analogue of exponential tilting is a change of measure on path space. In the Brownian motion case, let $(W_{t})_{0\leq t\leq T}$ be a standard Brownian motion. For constant $\theta$ ,

Z_{T}=\exp \left\{\theta W_{T}-{\frac {1}{2}}\theta ^{2}T\right\}

is an exponential martingale with expectation one. Defining a new measure $Q$ by

{\frac {dQ}{dP}}=Z_{T}

changes the stochastic drift of Brownian motion: under $Q$ , the process

{\widetilde {W}}_{t}=W_{t}-\theta t

is a standard Brownian motion. Thus $W_{t}$ has drift $\theta t$ under the new measure.

More generally, Girsanov's theorem states that if $\lambda _{t}$ is an adapted process satisfying suitable integrability conditions, then the stochastic exponential

Z_{T}=\exp \left\{\int _{0}^{T}\lambda _{t}\,dW_{t}-{\frac {1}{2}}\int _{0}^{T}\lambda _{t}^{2}\,dt\right\}

can be used as a Radon–Nikodym derivative.¹⁰¹¹ Under the measure $Q$ defined by $dQ/dP=Z_{T}$ ,

{\widetilde {W}}_{t}=W_{t}-\int _{0}^{t}\lambda _{s}\,ds

is Brownian. If

dX_{t}=b_{t}\,dt+\sigma _{t}\,dW_{t}

under $P$ , then under $Q$ the same process may be written as

dX_{t}=(b_{t}+\sigma _{t}\lambda _{t})\,dt+\sigma _{t}\,d{\widetilde {W}}_{t}.

This is analogous to exponential tilting of a single random variable, but it acts on the distribution of an entire stochastic path rather than only on a fixed-time marginal.

Choice of tilting parameter

Siegmund's algorithm

Assume i.i.d. Xs with light tailed distribution and $\mathbb {E} [X]>0$ . In order to estimate $\psi (c)=\mathbb {P} (\tau (c)<\infty )$ where $\tau (c)=\inf\{t:\sum \limits _{i=1}^{t}X_{i}>c\}$ , when $c$ is large and hence $\psi (c)$ small, the algorithm uses exponential tilting to derive the importance distribution. The algorithm is used in many aspects, such as sequential tests,¹² G/G/1 queue waiting times, and $\psi$ is used as the probability of ultimate ruin in ruin theory. In this context, it is logical to ensure that $\mathbb {P} _{\theta }(\tau (c)<\infty )=1$ . The criterion $\theta >\theta _{0}$ , where $\theta _{0}$ is s.t. $\kappa '(\theta _{0})=0$ achieves this. Siegmund's algorithm uses $\theta =\theta ^{*}$ , if it exists, where $\theta ^{*}$ is defined in the following way: $\kappa (\theta ^{*})=0$ . It has been shown that $\theta ^{*}$ is the only tilting parameter producing bounded relative error ( ${\underset {x\rightarrow \infty }{\lim \sup }}{\frac {\operatorname {Var} \mathbb {I} _{A(x)}}{\mathbb {P} A(x)^{2}}}<\infty$ ).¹³

Black-box algorithms

We can only see the input and output of a black box, without knowing its structure. The algorithm is to use only minimal information on its structure. When we generate random numbers, the output may not be within the same common parametric class, such as normal or exponential distributions. An automated way may be used to perform ECM. Let $X_{1},X_{2},\ldots$ be i.i.d. r.v.’s with distribution $G$ ; for simplicity we assume $X\geq 0$ . Define ${\mathfrak {F}}_{n}=\sigma (X_{1},\ldots ,X_{n},U_{1},\ldots ,U_{n})$ , where $U_{1},U_{2},\ldots$ are independent (0, 1) uniforms. A randomized stopping time for $X_{1},X_{2},\ldots$ is then a stopping time w.r.t. the filtration $\{{\mathfrak {F}}_{n}\},\ldots$ Let further ${\mathfrak {G}}$ be a class of distributions $G$ on $[0,\infty )$ with $k_{G}=\int _{0}^{\infty }e^{\theta x}G(dx)<\infty$ and define $G_{\theta }$ by ${\frac {dG_{\theta }}{dG(x)}}=e^{\theta x-k_{G}}$ . We define a black-box algorithm for ECM for the given $\theta$ and the given class ${\mathfrak {G}}$ of distributions as a pair of a randomized stopping time $\tau$ and an ${\mathfrak {F}}_{\tau }-$ measurable r.v. $Z$ such that $Z$ is distributed according to $G_{\theta }$ for any $G\in {\mathfrak {G}}$ . Formally, we write this as $\mathbb {P} _{G}(Z<x)=G_{\theta }(x)$ for all $x$ . In other words, the rules of the game are that the algorithm may use simulated values from $G$ and additional uniforms to produce an r.v. from $G_{\theta }$ .¹⁴

References

H.U. Gerber & E.S.W. Shiu (1994). "Option pricing by Esscher transforms". Transactions of the Society of Actuaries. 46: 99–191.
Cruz, Marcelo (2015). Fundamental Aspects of Operational Risk and Insurance Analytics. Wiley. pp. 784–796. ISBN 978-1-118-11839-9.
Butler, Ronald (2007). Saddlepoint Approximations with Applications. Cambridge University Press. pp. 156. ISBN 9780521872508.
Siegmund, D. (1976). "Importance Sampling in the Monte Carlo Study of Sequential Tests". The Annals of Statistics. 4 (4): 673–684. doi:10.1214/aos/1176343541.
Asmussen Soren & Glynn Peter (2007). Stochastic Simulation. Springer. p. 130. ISBN 978-0-387-30679-7.
Fuh, Cheng-Der; Teng, Huei-Wen; Wang, Ren-Her (2013). "Efficient Importance Sampling for Rare Event Simulation with Applications". arXiv:1302.0583. {{cite journal}}: Cite journal requires |journal= (help)
Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 164–167. ISBN 978-0-387-30679-7
Butler, Ronald (2007). Saddlepoint Approximations with Applications. Cambridge University Press. pp. 156–157. ISBN 9780521872508.
Seeber, G.U.H. (1992). Advances in GLIM and Statistical Modelling. Springer. pp. 195–200. ISBN 978-0-387-97873-4.
Asmussen Soren & Glynn Peter (2007). Stochastic Simulation. Springer. p. 407. ISBN 978-0-387-30679-7.
Steele, J. Michael (2001). Stochastic Calculus and Financial Applications. Springer. pp. 213–229. ISBN 978-1-4419-2862-7.
Siegmund, David (1985). Sequential Analysis. Springer-Verlag. ISBN 978-0387961347.
Asmussen Soren & Glynn Peter, Peter (2007). Stochastic Simulation. Springer. pp. 164–167. ISBN 978-0-387-30679-7.
Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 416–420. ISBN 978-0-387-30679-7

[1] H.U. Gerber & E.S.W. Shiu (1994). "Option pricing by Esscher transforms". Transactions of the Society of Actuaries. 46: 99–191.

[2] Cruz, Marcelo (2015). Fundamental Aspects of Operational Risk and Insurance Analytics. Wiley. pp. 784–796. ISBN 978-1-118-11839-9.

[3] Butler, Ronald (2007). Saddlepoint Approximations with Applications. Cambridge University Press. pp. 156. ISBN 9780521872508.

[4] Siegmund, D. (1976). "Importance Sampling in the Monte Carlo Study of Sequential Tests". The Annals of Statistics. 4 (4): 673–684. doi:10.1214/aos/1176343541.

[5] Asmussen Soren & Glynn Peter (2007). Stochastic Simulation. Springer. p. 130. ISBN 978-0-387-30679-7.

[6] Fuh, Cheng-Der; Teng, Huei-Wen; Wang, Ren-Her (2013). "Efficient Importance Sampling for Rare Event Simulation with Applications". arXiv:1302.0583. {{cite journal}}: Cite journal requires |journal= (help)

[7] Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 164–167. ISBN 978-0-387-30679-7

[8] Butler, Ronald (2007). Saddlepoint Approximations with Applications. Cambridge University Press. pp. 156–157. ISBN 9780521872508.

[9] Seeber, G.U.H. (1992). Advances in GLIM and Statistical Modelling. Springer. pp. 195–200. ISBN 978-0-387-97873-4.

[10] Asmussen Soren & Glynn Peter (2007). Stochastic Simulation. Springer. p. 407. ISBN 978-0-387-30679-7.

[11] Steele, J. Michael (2001). Stochastic Calculus and Financial Applications. Springer. pp. 213–229. ISBN 978-1-4419-2862-7.

[12] Siegmund, David (1985). Sequential Analysis. Springer-Verlag. ISBN 978-0387961347.

[13] Asmussen Soren & Glynn Peter, Peter (2007). Stochastic Simulation. Springer. pp. 164–167. ISBN 978-0-387-30679-7.

[14] Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 416–420. ISBN 978-0-387-30679-7

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Definition

Example

Advantages

Properties

Applications

Rare-event simulation

Saddlepoint approximation

Rejection sampling

Importance sampling

Example

Stochastic processes and Girsanov's theorem

Choice of tilting parameter

Siegmund's algorithm

Black-box algorithms

See also

References