[Top] [Prev] [Next] [References]

3. 7. Model discrimination, model selection, generalized likelihood ratio test

Several alternative models are often proposed to explain the same data, and objective criteria are needed to choose among models. The alternative models may be nested or non-nested. Nested models are constructed such that a simpler model can be obtained from a more complex model by eliminating one or more parameters from the more complex model. Thus choosing among models reduces to determining the appropriateness of the additional parameters. Non-nested models are not related in this way, and model selection must be based on some other criteria. I will not deal with non-nested models in my thesis.

While adding features to a model is often desirable, the increased complexity comes with a cost. In general, the more parameters contained in a model, the less reliable are parameter estimates. Criteria to select among models must weigh the trade-off between increased information and decreased reliability. I present three methods, all of which deal with the likelihood function, and because of this, model discrimination is related to parameter estimation. I begin with a discussion of nested models and then show how the three methods choose among models.

Beginning with the simplest case, a null model , specified by the parameter vector , is compared to an alternative model , which shares the k parameters of the null model but also contains an additional parameter, . In comparing the null to the alternative hypothesis, we are determining the appropriateness of adding the additional parameter to the null model. In other words, we are testing the following hypotheses:

This is a two-sided test because the null hypothesis is rejected if is determined to be significantly greater or less than 0 (or another pre-determined value). This can be extended to comparisons of models that differ by more than 1 parameter, with the alternative model having parameter space .

The likelihood function is based on parameter values and the data. As with parameter -estimation, parameters vectors and are chosen to maximize the likelihood function. In other words,

. (3.19)

The three model comparison methods compare these two likelihoods.

generalized likelihood ratio test

The generalized likelihood ratio test (GLRT) (Mood, et al., 1974; Bickel and Doksum, 1977; Hogg and Tannis, 1983), as its name implies, is based on the ratio of the likelihoods. Define a random variable with realizations, (x), based on the data, x:

, (3.20)

where L is the likelihood function as in equation (3.8). Note that . This is because the null hypothesis (based on ) is nested within the broader hypothesis (based on ), and will always be <= 1.0. Also, sup L will always be >= 0, so >= 0. In general, << 1.0 is grounds for rejecting the null hypothesis.

The likelihood ratio is useful because of the following result (Bickel and Doksum, 1977). First, assume that x = x1, x2,x3,...,xn is a sample from the probability density function or discrete density function with a k+1 dimensional parameter vector that takes on values unrestricted in Rk+1. Also assume that:

  1. The map is smooth in for each x;
  2. The maximum likelihood estimate is consistent (i.e., the estimate becomes arbitrarily close to the true value as n gets large).
Then, with formulated as above, if k+1 = 0 (the null hypothesis is true), the asymptotic distribution of is approximately 2 with 1 degree of freedom (Mood, et al., 1974; Bickel and Doksum, 1977). Thus a test of size is

Reject H0 if -2log /> ,

where is the (1-)th quantile of the chi-square distribution with 1 degree of freedom. This test can be extended to the case where the difference between the dimension of the null and alternative models is greater than 1. If the test is formulated as above, and the same assumptions are met, then is approximately 2 with r degrees of freedom, where r is the difference in dimension between the two models.

Akaike's information criterion

The other two methods operate under the premise of parsimony - simpler models are favored over more complex ones. The first is called Akaike's information criterion (AIC) (Akaike, 1973). For each alternative model proposed to describe data,

, (3.21)

where k + ri is the number of unspecified parameters in the ith model. In a sequence of nested models, the model with the largest AICi value is chosen. Compared to the GLRT method, the AIC method assigns proportionately more penalty for models of increasing complexity.

Bayesian information criterion

Both the GLRT and the AIC method have a similar drawback - as the sample size increases there is an increasing tendency to accept the more complex model (Raftery, 1986). The Bayesian information criterion (Schwarz, 1978) takes sample size into account. Although the BIC method was developed from a Bayesian standpoint, the result is insensitive to the prior distribution for adequate sample size.Thus a prior distribution need not be specified (Schwarz, 1978; Raftery, 1986), which simplifies the method. For each model, The BIC is calculated as

, (3.22)

where n is the sample size. As with the previous method, the model is chosen with the largest BIC. If just two alternative models are being compared, the BIC from the simpler model can be subtracted from the BIC from the more complex model. A positive value indicates that the more complex model should be favored, while a negative value favors the simpler model.


[Top] [Prev] [Next] [References]
Spatial and Temporal Models of Migrating Juvenile Salmon with Applications.
Home | Columbia R. DART | Status & Trends | Inseason Forecasts | Tools & Models | Research & Publications | Library | Site Map | Search
Please direct questions or comments to:
web@cbr.washington.edu
Columbia Basin Research,
School of Aquatic & Fishery Sciences,
University of Washington