Article · Wikipedia archive · Last revised Jun 14, 2026

Continuous Bernoulli distribution

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution is a family of continuous probability distributions parameterized by a single shape parameter , defined on the unit interval , by:

Last revised
Jun 14, 2026
Read time
≈ 6 min
Length
1,374 w
Citations
11
Source
Continuous Bernoulli distribution
Probability density function
Probability density function of the continuous Bernoulli distribution
Parameters λ = 1 / ( 1 + e θ ) ( 0 , 1 ) {\displaystyle \lambda =1/(1+e^{-\theta })\in (0,1)} θ R {\displaystyle \theta \in \mathbb {R} } , natural parameter
Support x [ 0 , 1 ] {\displaystyle x\in [0,1]} x [ 0 , 1 ] {\displaystyle x\in [0,1]}
PDF C ( λ ) λ x ( 1 λ ) 1 x {\displaystyle C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!}
where C ( λ ) = { 2 if  λ = 1 2 2 tanh 1 ( 1 2 λ ) 1 2 λ  otherwise {\displaystyle C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}}
f ( x θ ) = { 1 θ = 0 exp ( x θ log { ( e θ 1 ) / θ } ) θ 0 {\displaystyle f(x\mid \theta )={\begin{cases}1&\theta =0\\\exp(x\theta -\log\{(e^{\theta }-1)/\theta \})&\theta \neq 0\end{cases}}}
CDF F ( x λ ) = { x , λ = 1 2 λ x ( 1 λ ) 1 x + λ 1 2 λ 1 , otherwise {\displaystyle F(x\mid \lambda )={\begin{cases}x,&\lambda ={\tfrac {1}{2}}\\[6pt]{\dfrac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}},&{\text{otherwise}}\end{cases}}} F ( x θ ) = { x θ = 0 ( e θ x 1 ) / ( e θ 1 ) θ 0 {\displaystyle F(x\mid \theta )={\begin{cases}x&\theta =0\\(e^{\theta x}-1)/(e^{\theta }-1)&\theta \neq 0\end{cases}}}
Mean E [ X ] = { 1 2 λ = 1 2 λ 2 λ 1 + 1 2 tanh 1 ( 1 2 λ ) , otherwise {\displaystyle \operatorname {E} [X]={\begin{cases}{\tfrac {1}{2}}&\lambda ={\tfrac {1}{2}}\\[6pt]{\dfrac {\lambda }{2\lambda -1}}+{\dfrac {1}{2\tanh ^{-1}(1-2\lambda )}},&{\text{otherwise}}\end{cases}}} E [ X ] = { 1 / 2 θ = 0 e θ / ( e θ 1 ) θ 1 θ 0 {\displaystyle \operatorname {E} [X]={\begin{cases}1/2&\theta =0\\e^{\theta }/(e^{\theta }-1)-\theta ^{-1}&\theta \neq 0\end{cases}}}
Variance Var ( X ) = { 1 12 , λ = 1 2 λ ( 1 λ ) ( 1 2 λ ) 2 + 1 ( 2 tanh 1 ( 1 2 λ ) ) 2 , otherwise {\displaystyle \operatorname {Var} (X)={\begin{cases}{\tfrac {1}{12}},&\lambda ={\tfrac {1}{2}}\\[6pt]-{\dfrac {\lambda (1-\lambda )}{(1-2\lambda )^{2}}}+{\dfrac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}},&{\text{otherwise}}\end{cases}}} Var ( X ) = { 1 / 12 θ = 0 ( 2 e θ e θ ) 1 + θ 2 θ 0 {\displaystyle \operatorname {Var} (X)={\begin{cases}1/12&\theta =0\\(2-e^{\theta }-e^{-\theta })^{-1}+\theta ^{2}&\theta \neq 0\end{cases}}}

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution123 is a family of continuous probability distributions parameterized by a single shape parameter λ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)} , defined on the unit interval x [ 0 , 1 ] {\displaystyle x\in [0,1]} , by:

p ( x | λ ) λ x ( 1 λ ) 1 x . {\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.}

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,45 for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, [ 0 , 1 ] {\displaystyle [0,1]} -valued data.6789 This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, { 0 , 1 } {\displaystyle \{0,1\}} -valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing θ = log ( λ / ( 1 λ ) ) {\displaystyle \theta =\log \left(\lambda /(1-\lambda )\right)} for the natural parameter, the density can be rewritten in canonical form: p ( x | θ ) exp ( θ x ) {\displaystyle p(x|\theta )\propto \exp(\theta x)} . 10

Statistical inference

Given an independent sample of n {\displaystyle n} points x 1 , , x n {\displaystyle x_{1},\dots ,x_{n}} with x i [ 0 , 1 ] i {\displaystyle x_{i}\in [0,1]\,\forall i} from continuous Bernoulli, the log-likelihood of the natural parameter θ {\displaystyle \theta } is

L ( θ ) = θ i = 1 n x i n log { ( e θ 1 ) / θ } {\displaystyle {\mathcal {L}}(\theta )=\theta \sum _{i=1}^{n}x_{i}-n\log\{(e^{\theta }-1)/\theta \}}

and the maximum likelihood estimator of the natural parameter θ {\displaystyle \theta } is the solution of L ( θ ) = 0 {\displaystyle {\mathcal {L}}'(\theta )=0} , that is, θ ^ {\displaystyle {\hat {\theta }}} satisfies

e θ ^ e θ ^ 1 1 θ ^ = 1 n i = 1 n x i {\displaystyle {\frac {e^{\hat {\theta }}}{e^{\hat {\theta }}-1}}-{\frac {1}{\hat {\theta }}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}}

where the left hand side e θ ^ / ( e θ ^ 1 ) θ ^ 1 {\displaystyle e^{\hat {\theta }}/(e^{\hat {\theta }}-1)-{\hat {\theta }}^{-1}} is the expected value of continuous Bernoulli with parameter θ ^ {\displaystyle {\hat {\theta }}} . Although θ ^ {\displaystyle {\hat {\theta }}} does not admit a closed-form expression, it can be easily calculated with numerical inversion.


Further properties

The entropy of a continuous Bernoulli distribution is

H [ X ] = { 0  if  λ = 1 2 λ log ( λ ) ( 1 λ ) log ( 1 λ ) 1 2 λ log ( 2 tanh 1 ( 1 2 λ ) e ( 1 2 λ ) )  otherwise {\displaystyle \operatorname {H} [X]={\begin{cases}0&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda \log \left(\lambda \right)-\left(1-\lambda \right)\log \left(1-\lambda \right)}{1-2\lambda }}-\log \left({\frac {2\tanh ^{-1}\left(1-2\lambda \right)}{e\left(1-2\lambda \right)}}\right)&{\text{ otherwise}}\end{cases}}\!}

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set { 0 , 1 } {\displaystyle \{0,1\}} by the probability mass function:

p ( x ) = p x ( 1 p ) 1 x , {\displaystyle p(x)=p^{x}(1-p)^{1-x},}

where p {\displaystyle p} is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval [ 0 , 1 ] {\displaystyle [0,1]} results in the continuous Bernoulli probability density function, up to a normalizing constant.

Uniform distribution

The Uniform distribution between the unit interval [0,1] is a special case of continuous Bernoulli when λ = 1 / 2 {\displaystyle \lambda =1/2} or θ = 0 {\displaystyle \theta =0} .

Exponential distribution

An exponential distribution with rate Λ {\displaystyle \Lambda } restricted to the unit interval [0,1] corresponds to a continuous Bernoulli distribution with natural parameter θ = Λ < 0 {\displaystyle \theta =-\Lambda <0} .

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.11

References

References

  1. Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
  2. PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
  3. Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli Archived 2020-11-25 at the Wayback Machine
  4. Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  5. Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
  6. Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
  7. Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
  8. PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
  9. Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
  10. Lee, C. J.; Dahl, B. K.; Ovaskainen, O.; Dunson, D. B. (2026). Scalable and robust regression models for continuous proportional data. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2026.2626081
  11. Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).