Article · Wikipedia archive · Last revised Jun 12, 2026

Pseudolikelihood

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

Last revised
Jun 12, 2026
Read time
≈ 3 min
Length
695 w
Citations
2
Source

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

The pseudolikelihood approach was introduced by Julian Besag1 in the context of analysing data having spatial dependence.

Definition

Given a set of random variables X = X 1 , X 2 , , X n {\displaystyle X=X_{1},X_{2},\ldots ,X_{n}} the pseudolikelihood of X = x = ( x 1 , x 2 , , x n ) {\displaystyle X=x=(x_{1},x_{2},\ldots ,x_{n})} is

L ( θ ) := i P r θ ( X i = x i X j = x j  for  j i ) = i P r θ ( X i = x i X i = x i ) {\displaystyle L(\theta ):=\prod _{i}\mathrm {Pr} _{\theta }(X_{i}=x_{i}\mid X_{j}=x_{j}{\text{ for }}j\neq i)=\prod _{i}\mathrm {Pr} _{\theta }(X_{i}=x_{i}\mid X_{-i}=x_{-i})}

in discrete case and

L ( θ ) := i p θ ( x i x j  for  j i ) = i p θ ( x i x i ) = i p θ ( x i x 1 , , x ^ i , , x n ) {\displaystyle L(\theta ):=\prod _{i}p_{\theta }(x_{i}\mid x_{j}{\text{ for }}j\neq i)=\prod _{i}p_{\theta }(x_{i}\mid x_{-i})=\prod _{i}p_{\theta }(x_{i}\mid x_{1},\ldots ,{\hat {x}}_{i},\ldots ,x_{n})}

in continuous one. Here X {\displaystyle X} is a vector of variables, x {\displaystyle x} is a vector of values, p θ ( ) {\displaystyle p_{\theta }(\cdot \mid \cdot )} is conditional density and θ = ( θ 1 , , θ p ) {\displaystyle \theta =(\theta _{1},\ldots ,\theta _{p})} is the vector of parameters we are to estimate. The expression X = x {\displaystyle X=x} above means that each variable X i {\displaystyle X_{i}} in the vector X {\displaystyle X} has a corresponding value x i {\displaystyle x_{i}} in the vector x {\displaystyle x} and x i = ( x 1 , , x ^ i , , x n ) {\displaystyle x_{-i}=(x_{1},\ldots ,{\hat {x}}_{i},\ldots ,x_{n})} means that the coordinate x i {\displaystyle x_{i}} has been omitted. The expression P r θ ( X = x ) {\displaystyle \mathrm {Pr} _{\theta }(X=x)} is the probability that the vector of variables X {\displaystyle X} has values equal to the vector x {\displaystyle x} . This probability of course depends on the unknown parameter θ {\displaystyle \theta } . Because situations can often be described using state variables ranging over a set of possible values, the expression P r θ ( X = x ) {\displaystyle \mathrm {Pr} _{\theta }(X=x)} can therefore represent the probability of a certain state among all possible states allowed by the state variables.

The pseudo-log-likelihood is a similar measure derived from the above expression, namely (in discrete case)

l ( θ ) := log L ( θ ) = i log P r θ ( X i = x i X j = x j  for  j i ) . {\displaystyle l(\theta ):=\log L(\theta )=\sum _{i}\log \mathrm {Pr} _{\theta }(X_{i}=x_{i}\mid X_{j}=x_{j}{\text{ for }}j\neq i).}

One use of the pseudolikelihood measure is as an approximation for inference about a Markov or Bayesian network, as the pseudolikelihood of an assignment to X i {\displaystyle X_{i}} may often be computed more efficiently than the likelihood, particularly when the latter may require marginalization over a large number of variables.

Properties

Use of the pseudolikelihood in place of the true likelihood function in a maximum likelihood analysis can lead to good estimates, but a straightforward application of the usual likelihood techniques to derive information about estimation uncertainty, or for significance testing, would in general be incorrect.2

References

References

  1. Besag, J. (1975), "Statistical Analysis of Non-Lattice Data", The Statistician, 24 (3): 179–195, doi:10.2307/2987782, JSTOR 2987782
  2. Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, Oxford University Press. ISBN 0-19-920613-9