Article · Wikipedia archive · Last revised Jun 5, 2026

G-test

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

Last revised
Jun 5, 2026
Read time
≈ 13 min
Length
3,053 w
Citations
16
Source

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.1

Formulation

The general formula for test statistics of the G-test is

G = 2 i O i ln ( O i E i ) , {\displaystyle G=2\sum _{i}{O_{i}\cdot \ln \left({\frac {O_{i}}{E_{i}}}\right)},}

where O i 0 {\displaystyle O_{i}\geq 0} is the observed count in a cell, E i > 0 {\displaystyle E_{i}>0} is the expected count under the null hypothesis, ln {\displaystyle \ln } denotes the natural logarithm, and the sum is taken over all non-empty cells. The resulting G {\displaystyle G} is asymptotically chi-squared distributed as the total number of observations tends to infinity (convergence in distribution2).

Furthermore, the total observed count must be equal to the total expected count:

i O i = i E i = N , {\displaystyle \sum _{i}O_{i}=\sum _{i}E_{i}=N,}

where N {\displaystyle N} is the total number of observations.

Both, the G-test statistics G {\displaystyle G} and the chi-square test statistics χ 2 {\displaystyle \chi ^{2}} are special cases of a general family of power divergence statistics by Cressie and Read2. For λ { 0 , 1 } {\displaystyle \lambda \notin \{0,-1\}} set

CR λ = 2 λ ( λ + 1 ) i O i ( ( O i E i ) λ 1 ) . {\displaystyle \operatorname {CR} _{\lambda }={\frac {2}{\lambda (\lambda +1)}}\sum _{i}O_{i}\left(\left({\frac {O_{i}}{E_{i}}}\right)^{\lambda }-1\right).}

Then,

G = lim λ 0 CR λ , χ 2 = CR 1 . {\displaystyle G=\lim _{\lambda \to 0}\operatorname {CR} _{\lambda },\qquad \chi ^{2}=\operatorname {CR} _{1}.}

Derivation

We can derive the value of the G-test from the log-likelihood ratio test where the underlying model is a multinomial model.

Suppose we had a sample O = ( O 1 , , O m ) {\displaystyle O=(O_{1},\ldots ,O_{m})} where each O i {\displaystyle O_{i}} is the number of times that an object of type i {\displaystyle i} was observed. Furthermore, let N = i = 1 m O i {\displaystyle N=\sum _{i=1}^{m}O_{i}} be the total number of observations. If we assume that the underlying model is multinomial, then the test statistic is defined by

ln ( L ( p ~ | O ) L ( p ^ | O ) ) = ln ( i = 1 m p ~ i O i i = 1 m p ^ i O i ) , {\displaystyle \ln \left({\frac {L({\tilde {p}}|O)}{L({\hat {p}}|O)}}\right)=\ln \left({\frac {\prod _{i=1}^{m}{\tilde {p}}_{i}^{O_{i}}}{\prod _{i=1}^{m}{\hat {p}}_{i}^{O_{i}}}}\right),}

where p ~ = ( p ~ 1 , , p ~ m ) {\displaystyle {\tilde {p}}=({\tilde {p}}_{1},\ldots ,{\tilde {p}}_{m})} is the null hypothesis and p ^ = ( p ^ 1 , , p ^ m ) {\displaystyle {\hat {p}}=({\hat {p}}_{1},\ldots ,{\hat {p}}_{m})} is the maximum likelihood estimate (MLE) of the parameters given the data. Recall that for the multinomial model, the MLE of p ^ i {\displaystyle {\hat {p}}_{i}} given some data is given by

p ^ i = O i N . {\displaystyle {\hat {p}}_{i}={\frac {O_{i}}{N}}\,.}

Furthermore, we may represent each null hypothesis parameter p ~ i {\displaystyle {\tilde {p}}_{i}} as

p ~ i = E i N , {\displaystyle {\tilde {p}}_{i}={\frac {E_{i}}{N}}\,,}

where E i {\displaystyle E_{i}} is the expected count of objects of type i {\displaystyle i} under the null hypothesis. Thus, by substituting the representations of p ~ i {\displaystyle {\tilde {p}}_{i}} and p ^ i {\displaystyle {\hat {p}}_{i}} in the log-likelihood ratio, the equation simplifies to

ln ( L ( p ~ | O ) L ( p ^ | O ) ) = ln ( i = 1 m ( E i O i ) O i ) = i = 1 m O i ln ( E i O i ) {\displaystyle \ln \left({\frac {L({\tilde {p}}|O)}{L({\hat {p}}|O)}}\right)=\ln \left(\prod _{i=1}^{m}\left({\frac {E_{i}}{O_{i}}}\right)^{O_{i}}\right)=\sum _{i=1}^{m}O_{i}\ln \left({\frac {E_{i}}{O_{i}}}\right)}

Finally, multiply by a factor of 2 {\displaystyle -2} (used to make the G-test formula asymptotically equivalent to the Pearson's chi-squared test statistics) to achieve the form

G = 2 i = 1 m O i ln ( E i O i ) = 2 i = 1 m O i ln ( O i E i ) {\displaystyle G=-2\sum _{i=1}^{m}O_{i}\ln \left({\frac {E_{i}}{O_{i}}}\right)=2\sum _{i=1}^{m}O_{i}\ln \left({\frac {O_{i}}{E_{i}}}\right)}

Heuristically, one can imagine O i {\displaystyle O_{i}} as continuous and approaching zero, in which case O i ln O i 0 {\displaystyle O_{i}\ln O_{i}\to 0} , and terms with zero observations can simply be dropped. However the expected count in each cell must be strictly greater than zero for each cell ( E i > 0 {\displaystyle E_{i}>0} for all i {\displaystyle i} ) to apply the method.

Distribution and use

Given the null hypothesis that the observed frequencies result from random sampling from a distribution with the given expected frequencies, the distribution of the test statistics G {\displaystyle G} is approximately a chi-squared distribution, with the same number of degrees of freedom as in the corresponding chi-squared test.

For very small samples the multinomial test for goodness of fit, and Fisher's exact test for contingency tables, or even Bayesian hypothesis selection are preferable to the G-test.3 McDonald recommends to always use an exact test (exact test of goodness-of-fit, Fisher's exact test) if the total sample size is less than 1 000 .

There is nothing magical about a sample size of 1 000, it's just a nice round number that is well within the range where an exact test, chi-square test, and G–test will give almost identical p values. Spreadsheets, web-page calculators, and SAS shouldn't have any problem doing an exact test on a sample size of 1 000 .
— John H. McDonald (2014)3

G-tests have been recommended at least since the 1981 edition of Biometry, a statistics textbook by Robert R. Sokal and F. James Rohlf.4

Relation to other metrics

Relation to the chi-squared test

The commonly used chi-squared tests for goodness of fit to a distribution and for independence in contingency tables are in fact approximations of the log-likelihood ratio on which the G-tests are based.5

The general formula for Pearson's chi-squared test statistic is

χ 2 = i ( O i E i ) 2 E i . {\displaystyle \chi ^{2}=\sum _{i}{\frac {\left(O_{i}-E_{i}\right)^{2}}{E_{i}}}.}

The approximation of the G-test statistics by chi-squared test statistics is obtained by a second order Taylor expansion of the natural logarithm around 1 (see the derivation below). We have G χ 2 {\displaystyle G\approx \chi ^{2}} when the observed counts O i {\displaystyle O_{i}} are close to the expected counts E i {\displaystyle E_{i}} . When this difference is large, however, the approximation by the chi-squared test statistics begins to break down. Here, the effects of outliers in data will be more pronounced, and this explains the why chi-squared tests fail in situations with little data.

For samples of a reasonable size, the G-test and the chi-squared test will lead to the same conclusions. However, the approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson's chi-squared test.6 In cases where O i > 2 E i {\displaystyle O_{i}>2\cdot E_{i}} for some cell case the G-test is always better than the chi-squared test.

For testing goodness-of-fit the G-test is infinitely more efficient than the chi-squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodges and Lehmann.78

Derivation (chi-squared)

Consider

G = 2 i O i ln ( O i E i ) , {\displaystyle G=2\sum _{i}{O_{i}\ln \left({\frac {O_{i}}{E_{i}}}\right)},}

and let O i = E i + δ i {\displaystyle O_{i}=E_{i}+\delta _{i}} with i δ i = 0 {\displaystyle \textstyle \sum _{i}\delta _{i}=0} , so that the total number of counts remains the same. Assume that δ i = O i E i {\displaystyle \delta _{i}=O_{i}-E_{i}} is small in comparison to E i {\displaystyle E_{i}} for all i {\displaystyle i} . To be more precise, notice that E i = Θ ( n ) {\displaystyle E_{i}=\Theta (n)} using big Θ notation. If O i = E i + O ( n 1 / 2 ) {\displaystyle O_{i}=E_{i}+{\mathcal {O}}(n^{1/2})} using big O notation for large n {\displaystyle n} , which should be true under the null hypothesis because of the central limit theorem, then δ i = O ( n 1 / 2 ) {\displaystyle \delta _{i}={\mathcal {O}}(n^{1/2})} and

δ i 3 E i 2 = O ( n 3 / 2 n 2 ) = O ( n 1 / 2 ) {\displaystyle {\frac {\delta _{i}^{3}}{E_{i}^{2}}}={\mathcal {O}}\left({\frac {n^{3/2}}{n^{2}}}\right)={\mathcal {O}}(n^{-1/2})}

follow, which will be used later.

Upon substitution we find,

G = 2 i ( E i + δ i ) ln ( 1 + δ i E i ) . {\displaystyle G=2\sum _{i}(E_{i}+\delta _{i})\ln \left(1+{\frac {\delta _{i}}{E_{i}}}\right).}

Using the Taylor expansion ln ( 1 + x ) = x 1 2 x 2 + O ( x 3 ) {\displaystyle \ln(1+x)=x-{\tfrac {1}{2}}x^{2}+{\mathcal {O}}(x^{3})} yields

G = 2 i ( E i + δ i ) ( δ i E i 1 2 δ i 2 E i 2 + O ( δ i 3 E i 3 ) ) , {\displaystyle G=2\sum _{i}(E_{i}+\delta _{i})\left({\frac {\delta _{i}}{E_{i}}}-{\frac {1}{2}}{\frac {\delta _{i}^{2}}{E_{i}^{2}}}+{\mathcal {O}}\left({\frac {\delta _{i}^{3}}{E_{i}^{3}}}\right)\right),}

and distributing terms we find,

G = 2 i ( δ i + 1 2 δ i 2 E i + O ( δ i 3 E i 2 ) ) . {\displaystyle G=2\sum _{i}\left(\delta _{i}+{\frac {1}{2}}{\frac {\delta _{i}^{2}}{E_{i}}}+{\mathcal {O}}\left({\frac {\delta _{i}^{3}}{E_{i}^{2}}}\right)\right).}

Now, using i δ i = 0 {\displaystyle \textstyle \sum _{i}\delta _{i}=0} and δ i = O i E i {\displaystyle \delta _{i}=O_{i}-E_{i}} and O ( δ i 3 / E i 2 ) = O ( n 1 / 2 ) {\displaystyle {\mathcal {O}}(\delta _{i}^{3}/E_{i}^{2})={\mathcal {O}}(n^{-1/2})} for large n {\displaystyle n} , we can write the result,

G i ( O i E i ) 2 E i . {\displaystyle G\approx \sum _{i}{\frac {\left(O_{i}-E_{i}\right)^{2}}{E_{i}}}.}

Relation to Kullback–Leibler divergence

The G-test statistic is proportional to the Kullback–Leibler divergence of the theoretical distribution p ~ = ( p ~ 1 , , p ~ m ) {\displaystyle {\tilde {p}}=({\tilde {p}}_{1},\ldots ,{\tilde {p}}_{m})} of the null hypothesis from the empirical distribution p ^ = ( p ^ 1 , , p ^ m ) {\displaystyle {\hat {p}}=({\hat {p}}_{1},\ldots ,{\hat {p}}_{m})} of the observed data:

G = 2 i O i ln ( O i E i ) = 2 N i p ^ i ln ( p ^ i p ~ i ) = 2 N D K L ( p ^ p ~ ) , {\displaystyle {\begin{aligned}G&=2\sum _{i}{O_{i}\cdot \ln \left({\frac {O_{i}}{E_{i}}}\right)}=2N\sum _{i}{{\hat {p}}_{i}\cdot \ln \left({\frac {{\hat {p}}_{i}}{{\tilde {p}}_{i}}}\right)}\\&=2N\,D_{\mathrm {KL} }({\hat {p}}\|{\tilde {p}}),\end{aligned}}}

where N {\displaystyle N} is the total number of observations and p ~ i = E i N {\displaystyle {\tilde {p}}_{i}={\tfrac {E_{i}}{N}}} and p ^ i = O i N {\displaystyle {\hat {p}}_{i}={\tfrac {O_{i}}{N}}} are the theoretical and empirical probabilities of objects of type i {\displaystyle i} , respectively.

Relation to mutual information

For analysis of contingency tables the value of the G-test statistics can also be expressed in terms of mutual information.

In this case objects with two-dimensional types ( i , j ) {\displaystyle (i,j)} are considered. Let O i j {\displaystyle O_{ij}} be the count of objects of type ( i , j ) {\displaystyle (i,j)} , i.e., O i j {\displaystyle O_{ij}} is the entry in the contingency table in row i {\displaystyle i} and column j {\displaystyle j} . Set

N = i j O i j , p ^ i j = O i j N , p ^ i = j O i j N , p ^ j = i O i j N . {\displaystyle N=\sum _{ij}O_{ij},\qquad {\hat {p}}_{ij}={\frac {O_{ij}}{N}}\,,\qquad {\hat {p}}_{i\bullet }={\frac {\sum _{j}O_{ij}}{N}}\,,\qquad {\hat {p}}_{\bullet j}={\frac {\sum _{i}O_{ij}}{N}}\,.}

Then the estimated expected count of objects of type ( i , j ) {\displaystyle (i,j)} assuming independence is given by

E i j = N p ^ i p ^ j . {\displaystyle E_{ij}=N{\hat {p}}_{i\bullet }{\hat {p}}_{\bullet j}.}

Finally, the G-test statistics in this case is given by

G = 2 i j O i j ln ( O i j E i j ) {\displaystyle G=2\sum _{ij}O_{ij}\ln \left({\frac {O_{ij}}{E_{ij}}}\right)}

Let X , Y {\displaystyle X,Y} be random variables with joint distribution given by the empirical distribution p ^ i j {\displaystyle {\hat {p}}_{ij}} of the contingency table, i.e.,

P ( X = i , Y = j ) = p ^ i j , P ( X = i ) = p ^ i , P ( Y = j ) = p ^ j . {\displaystyle P(X=i,Y=j)={\hat {p}}_{ij},\qquad P(X=i)={\hat {p}}_{i\bullet },\qquad P(Y=j)={\hat {p}}_{\bullet j}.}

Then the G-test statistics can be expressed in several alternative forms:

G = 2 N i j p ^ i j ( ln ( p ^ i j ) ln ( p ^ i ) ln ( p ^ j ) ) = 2 N ( H ( X ) + H ( Y ) H ( X , Y ) ) = 2 N MI ( X , Y ) , {\displaystyle {\begin{aligned}G&=2N\cdot \sum _{ij}{{\hat {p}}_{ij}\left(\ln({\hat {p}}_{ij})-\ln({\hat {p}}_{i\bullet })-\ln({\hat {p}}_{\bullet j})\right)}\\&=2N\cdot {\Bigl (}H(X)+H(Y)-H(X,Y){\Bigr )}\\&=2N\cdot \operatorname {MI} (X,Y),\end{aligned}}}

where the entropies H ( X ) {\displaystyle H(X)} and H ( Y ) {\displaystyle H(Y)} are given

H ( X ) = i p ^ i ln ( p ^ i ) , H ( Y ) = j p ^ j ln ( p ^ j ) {\displaystyle H(X)=-\sum _{i}{\hat {p}}_{i\bullet }\ln({\hat {p}}_{i\bullet }),\qquad H(Y)=-\sum _{j}{\hat {p}}_{\bullet j}\ln({\hat {p}}_{\bullet j})}

and the joint entropy H ( X , Y ) {\displaystyle H(X,Y)} is given by

H ( X , Y ) = i j p ^ i j ln ( p ^ i j ) {\displaystyle H(X,Y)=-\sum _{ij}{\hat {p}}_{ij}\ln({\hat {p}}_{ij})}

and the mutual information of X {\displaystyle X} and Y {\displaystyle Y} is

MI ( X , Y ) = H ( X ) + H ( Y ) H ( X , Y ) . {\displaystyle \operatorname {MI} (X,Y)=H(X)+H(Y)-H(X,Y).}


It can also be shown that the inverse document frequency weighting commonly used for text retrieval is an approximation of G applicable when the row sum for the query is much smaller than the row sum for the remainder of the corpus. Similarly, the result of Bayesian inference applied to a choice of single multinomial distribution for all rows of the contingency table taken together versus the more general alternative of a separate multinomial per row produces results very similar to the G-test statistic.

Application

Statistical software

  • In R fast implementations can be found in the AMR and Rfast packages. For the AMR package, the command is g.test which works exactly like chisq.test from base R. R also has the likelihood.test Archived 2013-12-16 at the Wayback Machine function in the Deducer Archived 2012-03-09 at the Wayback Machine package. Note: Fisher's G-test in the GeneCycle Package of the R programming language (fisher.g.test) does not implement the G-test as described in this article, but rather Fisher's exact test of Gaussian white-noise in a time series.11
  • Another R implementation to compute the G-test statistic and corresponding p-values is provided by the R package entropy. The commands are Gstat for the standard G statistic and the associated p-value and Gstatindep for the G statistic applied to comparing joint and product distributions to test independence.
  • In SAS, one can conduct G-test by applying the /chisq option after the proc freq.12
  • In Stata, one can conduct a G-test by applying the lr option after the tabulate command.
  • In Java, use org.apache.commons.math3.stat.inference.GTest.13
  • In Python, use scipy.stats.power_divergence with lambda_=0.14
References

References

  1. McDonald, J.H. (2014). "G–test of goodness-of-fit". Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58.
  2. Cressie, Noel; Read, Timothy R. C. (1984). "Multinomial goodness-of-fit tests". Journal of the Royal Statistical Society. Series B (Methodological). 46 (3): 440–464. doi:10.1111/j.2517-6161.1984.tb01318.x. JSTOR 2345686. Retrieved 14 January 2026.
  3. McDonald, John H. (2014). "Small numbers in chi-square and G–tests". Handbook of Biological Statistics (3rd ed.). Baltimore, MD: Sparky House Publishing. pp. 86–89.
  4. Sokal, R. R.; Rohlf, F. J. (1981). Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.). New York: Freeman. ISBN 978-0-7167-2411-7.
  5. Hoey, J. (2012). "The Two-Way Likelihood Ratio (G) Test and Comparison to Two-Way Chi-Squared Test". arXiv:1206.4881 [stat.ME].
  6. Harremoës, P.; Tusnády, G. (2012). "Information divergence is more chi squared distributed than the chi squared statistic". Proceedings ISIT 2012. pp. 538–543. arXiv:1202.1125. Bibcode:2012arXiv1202.1125H.
  7. Quine, M. P.; Robinson, J. (1985). "Efficiencies of chi-square and likelihood ratio goodness-of-fit tests". Annals of Statistics. 13 (2): 727–742. doi:10.1214/aos/1176349550.
  8. Harremoës, P.; Vajda, I. (2008). "On the Bahadur-efficient testing of uniformity by means of the entropy". IEEE Transactions on Information Theory. 54 (1): 321–331. Bibcode:2008ITIT...54..321H. CiteSeerX 10.1.1.226.8051. doi:10.1109/tit.2007.911155. S2CID 2258586.
  9. Dunning, Ted (1993). "Accurate Methods for the Statistics of Surprise and Coincidence". Computational Linguistics. 19 (1): 61–74.
  10. Rivas, Elena (30 October 2020). "RNA structure prediction using positive and negative evolutionary information". PLOS Computational Biology. 16 (10) e1008387. Bibcode:2020PLSCB..16E8387R. doi:10.1371/journal.pcbi.1008387. PMC 7657543. PMID 33125376.
  11. Fisher, R. A. (1929). "Tests of significance in harmonic analysis". Proceedings of the Royal Society of London A. 125 (796): 54–59. Bibcode:1929RSPSA.125...54F. doi:10.1098/rspa.1929.0151. hdl:2440/15201.
  12. G-test of independence, G-test for goodness-of-fit in Handbook of Biological Statistics, University of Delaware. (pp. 46–51, 64–69 in: McDonald, J. H. (2009) Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.)
  13. "org.apache.commons.math3.stat.inference.GTest". Archived from the original on 2018-07-26. Retrieved 2018-07-11.
  14. "Scipy.stats.power_divergence — SciPy v1.7.1 Manual".
External links