[Top] [Prev] [Next] [References]

3. 6. Goodness-of-fit

Goodness-of-fit tests are used to determine how well a proposed model fits a particular data set. The procedure is to first compute a test statistic based on the deviation of the data from the model (with the parameter estimates) and then compare it to a theoretical or empirical distribution based on the assumption that the model is true. A rough probability of observing the particular data set, given the model is true, can then be determined. If the probability of observing the data is too low, the model is rejected. I should note that I use goodness-of-fit tests to get a rough idea of a model's performance - there is no threshold value below which a model is deemed not to work. In most cases I apply models to a series of data sets, and consistently low p-values is evidence that model is not appropriate. The main purpose of the tests is to assess whether a model is useful in describing observations and hence useful for predictive purposes.

Two types of goodness-of-fit tests have been commonly employed: chi-square type tests and tests based on the empirical distribution function (EDF), although other classes of tests have been used (D'agostino and Stephens, 1986). Chi-square tests are used when data are grouped into discrete classes, and observed frequencies are compared to expected frequencies based on a model. Although Pearson's X2 test is the most familiar, other tests fall into this category, such as the G test, Tukey's test and the Rao-Robson test (Moore, 1986). In all cases the test statistic is formulated such that it follows a chi-square distribution, and because of this, these test are usually convenient to use. Tests based on the EDF are used most often with continuous data. An empirical density function is constructed by ranking the data, and this is compared to the model's cumulative distribution function (CDF). The test statistic is based on the deviation of the EDF from the CDF, and its distribution is obtained by Monte-Carlo simulations. The most familiar test of this type is the Kolmogorov test (Conover, 1980).

chi-squared goodness-of-fit test

The most commonly used chi-squared test is Pearson's X2 test (Pearson, 1900), which compares expected frequencies to observed frequencies in discrete cells. If the model is fully specified (i.e., no parameters are estimated from the data), then the cell probabilities can be obtained by integrating over the cell width, wi:

. (3.10)

The expected frequency in cell i is then computed as

, (3.11)

where N is the total sample size. Pearson showed that the test statistic

(3.12)

asymptotically follows a 2(k) distribution

discrete class data

These tests are particularly useful when the data are the form of the frequency of individuals falling into discrete classes. An issue with both these tests is how to lump the classes. If the E(ni)'s are too small, the tests are not valid (Cochran, 1952; Roscoe and Byars, 1971). In all cases, I lump the data such that E(ni) > 1.0 for all i's.

using chi-squared tests with continuous data

Using chi-squared tests in situations where the data are continuous involves a trade-off: the tests are flexible and easy to use, but because the data must be placed into discrete classes, information is lost and the tests are not as powerful as some alternatives (Moore, 1986). One advantage of using these tests with continuous data, though, is that it is possible to have equiprobable cells, improving the efficiency of the test (Mann and Wald, 1942; Cohen and Sackrowitz, 1975). Mann and Wald (1942) recommended the following equation for choosing the number of cells, k, at significant level :

, (3.13)

where c() is the (1-)th quantile of the standard normal distribution. Other people (e.g., Schorr, 1974) have argued that fewer cells than this are optimal, and in light of this, Moore (1986) recommends using a value for k that is between that given by equation (3.13) and half that. I will use equation (3.13) with = 0.05; since equation (3.13) decreases with decreasing , this practice will cover the range of = 0.05 and lower values.

using chi-squared tests when parameters are estimated in the model

At first glance it appears that chi-squared tests can readily accommodate models that have parameters estimated from the data. The standard approach is to subtract one degree of freedom for each parameter estimated. As Fisher (1924) showed, however, the type of estimation procedure used affects the outcome of the goodness-of-fit test. The appropriate parameter estimation method to use is the minimum chi-squared criterion. This involves minimizing the X2 statistic with respect to the parameters and is achieved by solving the following equation:

= 0, p = 1, 2, ..., r, (3.14)

where r is the number of parameters estimated. This method has several drawbacks. First, this equation is difficult to solve - analytical solutions are rarely available, and the response surface is not smooth. Second, chi-square estimation procedure is rarely used, and ideally the parameter estimates used in the goodness-of-fit tests are those obtained from the parameter estimation part of the data analysis. Fortunately, using parameter estimates from other methods (such as maximum likelihood) results in tests that are conservative - i.e., they reject the model too often. Thus there are three choices: 1) use the minimum chi-squared criterion and accept its downfalls, 2) use another estimation method and use a conservative test, or 3) use a test that includes a correction factor, such as the Rao-Robson test (Rao and Robson, 1974).

tests using the empirical density function

When data are continuous and the model is fully specified, tests involving the EDF are easy to use and generally more powerful than chi-squared tests. In cases where parameters are estimated from the data, these tests become more difficult to implement, and the theory is not as well developed (Stephens, 1986).

As stated previously, these tests are based on the deviation of the EDF from the CDF of the model distribution. The empirical density function is constructed by ranking the observations and then computing

(# of observations <= x), n = 0, 1, ..., N. (3.15)

This results in a step function that increases 1/N in height at each observation. This is compared to the model CDF, F(X). The most commonly used statistic is D, first introduced by Kolmogorov (1933):

, (3.16)

which is the largest vertical distance between Fn(X) and F(X). Other statistics have been proposed that involve the squared difference between Fn(X) and F(X) integrated over the entire range of x.

In some cases it is more convenient to work with data after they have been transformed such that

, . (3.17)

If the model is true Z will be uniformly distributed on [0,1]. If z(i) is the ith-ranked transformed data point, then

. (3.18)

The basic goodness-of-fit test is as follows. We would like to test the hypothesis that a random sample, x1, x2, ..., xN, came from a fully specified distribution, F(X). In other words,

The procedures are followed as outlined above, and the resulting test statistic is compared to its tabulated distribution. A value falling in the upper extreme of the distribution is evidence against the null hypothesis.

EDF tests with estimated parameters

When parameters are estimated from the data, EDF tests become less general. If the parameters are location (e.g., the mean of a normal distribution) and/or scale (e.g., the variance of a normal distribution) parameters, the distribution of the EDF statistic is dependent on the family of distribution in question but not on the particular parameter values. This is the case with the normal and exponential distributions, among others, and these distributions of test statistics for many of these families have been tabulated. In cases where a shape parameter is estimated (e.g., Gamma and Inverse Gaussian distributions), the distribution of test statistics is dependent not only on the family of distribution but also on the true parameter values, making the use of these tests quite cumbersome. One way to overcome this is to create the distribution of test statistics with Monte-Carlo simulations as they are needed.


[Top] [Prev] [Next] [References]
Spatial and Temporal Models of Migrating Juvenile Salmon with Applications.
Home | Columbia R. DART | Status & Trends | Inseason Forecasts | Tools & Models | Research & Publications | Library | Site Map | Search
Please direct questions or comments to:
web@cbr.washington.edu
Columbia Basin Research,
School of Aquatic & Fishery Sciences,
University of Washington