Summary
A general description of individual animal behavior and history consists of three components, initial spatial location, mortality, and spatial translation. Flexibility and generation of competing theories arise from alternative formulations for any component. Individual animal models can be modified for spatially (and temporally) aggregated animals by appropriate integration. State-space models provide a natural framework for estimating and predicting animal population abundance given partial or inexact information, such as that provided by mark-recapture or harvest data. As an example, a multivariate, linear normal state-space model that explicitly incorporates each of the three individual animal components is formulated for Pacific coho salmon (Oncorhynchus kisutch) migration and harvest. Using recoveries of tagged coho salmon caught in ocean fisheries and associated measures of fishing effort, the Kalman filter and maximum likelihood are used to estimate parameters of the processes. Real-time management of a harvested animal population could be improved using recent harvest and effort data, for example, the estimated parameters, and the Kalman prediction algorithm. Competing theories of survival and migration can be statistically tested as well.
KEY WORDS: Kalman algorithms, Capture and Recapture, Spatial Statistics, Time Series
Describing and identifying the mechanisms associated with animal movement and survival are important objectives of fish and wildlife scientists and managers. Understanding the factors that may influence survival and spatial-temporal distribution of a population can be helpful for rebuilding threatened populations. In the case of exploited species, knowledge of animal abundance by area and time is helpful for harvest control.
Previous work on the modeling of animal survival and movement has been at both the individual animal and the spatially-temporally aggregated levels. Anderson-Sprecher and Ledolter (1991) modeled the movement of individual mule deer (Odocoileus hemionus) as a two-dimensional random walk and allowed for errors in the telemetry data by using a state-space model. Hilborn (1990) estimated survival and migration rates from one area to another over multiple time periods for skipjack tuna (Euthynnus pelamis), which had been marked and released in different areas. Schwarz, Schweigert, and Arnason (1993) also formulated an approach to estimating survival and migration rates between areas using data similar to Hilborn's (1990) but for Pacific herring (Clupea harengus pallasi).
Here I present a framework for individual animal movement and survival with three distinct components, initial location, survival, and movement. Explanatory covariates can be incorporated in any of three components and competing theories for survival and movement can be tested statistically. The individual animal framework can be modified to accommodate spatial-temporal aggregation of animals. To account for uncertainty in information about individual or groups of animals, state-space models are used to link observed data to the underlying true situation. As a demonstration, the methodology is applied to model the movement and survival of Pacific coho salmon (Oncorhynchus kisutch) marked at a hatchery and recovered in ocean fisheries.
To minimize notation and simplify the discussion, the existence of density functions with respect to counting or Lebesgue measure is assumed throughout and are generically denoted by f. Time will be treated as a discrete value and the distance between points in time are evenly spaced, t = 0, 1, 2, . . ..
Let pt represent the location of a given animal at time t, where pt may be a vector. Let st be an indicator variable for being alive at time t and mt be the spatial translation made from time t - 1 to t. For simplicity, movement is assumed to occur exactly at time t, while mortality can be experienced at any time between [t - 1, t). If an animal dies during [t - 1, t), its location remains pt-1 at time t. The location at time t can be written as
If spatial location is described with respect to the plane, and Cartesian coordinates are used, (eq 1) can be written as
(xt, yt) = (xt-1, yt-1) +st (
xt,
yt)
where
represents translation in one dimension. Expressing the change in location, mt, in polar coordinates may be a more natural approach. An animal 'chooses' a direction to move,
t, and then a distance to move in that direction, rt. Again working within the plane,
(xt, yt) = (xt-1, yt-1) +st (rt cos
t, rt sin
t)
The modeling of individual animal movement and mortality can thus be partitioned into submodels for initial location, survival, and movement, which in turn may be functions of covariates and parameters,
. Initial location, p0, is either a known constant or a random variable with density function
(p0). Survival at time t may be modeled conditional on st-1 with density function
(st | st-1), and may be a function of covariates such as age, animal size, location, and harvesting effort. For movement either a single density function could be used,
(mt), or a combination of density functions, one for direction,
(
t), and distance,
(rt), the latter perhaps being conditional on the direction. Both
and r could be time and space dependent.
The likelihood for a given animal observed for T time periods is:
Previous locations known
Suppose that instead of information about an individual animal's exact location, one has data about animal abundance in K disjoint areas at time t. Area is either a line segment, a region, or a volume in
,
, or
, respectively. Let na,t represent the abundance in area a at time t. Suppose also that the number of animals in area a at t that were in another area b at t - 1 is known. For instance, nb,t-1 were marked with a distinguishing mark, and these marks can be counted in area a at time t. The likelihood function for the aggregates can be formulated in terms of each of the individual animal model components. In the case of known releases of marked fish within each area, the starting vector of abundance would be a constant. Otherwise, the probability of individual animal's presence in area a at t= 0:
and the initial spatial distribution is a multinomial random variable with sample size n0, say, and probability vector
. Assuming independence and constant survival probabilities within an area-time cell, the number of animals surviving to the next time point is a binomial random variable. Letting na,t + be the survivors of those in area a at time t and p be
(sa,t= 1),
For a given location within cell a, the probability of moving to a new cell b is the integral of the density of the translation random variable, over the direction and distance necessary to reach b from that location in a. In the absence of information as to the location of individuals within cell a, assume that the location within the cell is a uniform random variable appropriately scaled to the cell length (or area or volume). For simplicity suppose the spatial framework is a line with the left and right endpoints of a and b being [aL, aR] and [bL, bR], respectively, where aR <= bL. The probability of moving from a to b is:
where the density function subscripts emphasize the potential for space and time dependence. Assuming independence of movement between animals, the number moving from cell a to any of the K cells is multinomial with total sample size na,t + and probability vector,
. Combining each of the components, the joint probability distribution for the vector
conditional on na,t is a product of binomial and multinomial distributions.
Previous locations unknown
Sometimes abundance by time and area is known, but not the previous location of the animals. To show that some information may be extracted from such a situation, consider an extreme artificial example where there are only two areas, a and b, and two time periods. During the first time period there are 100 animals in area a but 0 animals in b. In the second time period there are 0 animals in a and 80 in b. Clearly the survival rate was 80% and the movement probability from a to b was 100%.
The likelihood based on this kind of aggregated data can be formulated in a manner similar to before. The initial location and survival components are identical to (eq 2) and (eq 3), respectively. The likelihood for the movement component, however, becomes a complicated convolution of multinomials constrained by the necessity that the total in each cell at the next time equal observed values. Suppose there are two areas a and b and na,t-1 and nb,t-1 equals 1 and 3, and na,t and nb,t equals 2 and 2. The probability of (na,t , nb,t | na,t-1 , nb,t-1) is the sum of all the probabilities of events that could yield 2 in a and 2 in b. Here there are only two possibilities, if one stays in a, then one came from b; if one moves to b, then two came from b. The number of possibilities is prohibitively large, however, in most situations.
A simplifying approximation is a multivariate normal distribution for the conditional distribution of the vector of counts at t, nt, given nt-1. The mean vector and covariance matrix correspond to linear combinations of multinomial expectations and covariances. The mean vector,
,
where St is a diagonal matrix where each element on the diagonal is the survival probability for the corresponding area, e.g., St[a, a] for area a, and is a function of
(st). Mt is a matrix with column a corresponding to the movement probabilities from area a to any area.
The covariance matrix
, has components
Error free information about individual animal location or actual abundance in a time-area cell is seldom available. For example, only indices of abundance by time and area may be available, such as aerial counts of deer or fish caught in a fishery. State-space models (SSM) are a natural framework for relating actual animal location to estimated locations or area-time counts to count indices. SSMs consist of two time series (scalar or vector), one based on an unobservable state process, xt, and the other based on a known observation process, yt, that is a function of xt. Assuming a first order Markovian state process, a general description of the two processes may be written in terms of the relevant conditional density functions:
State Process : f (xt | xt-1) (eq 5)
Observation Process : g (yt | xt) (eq 6)
A special case of (eq 5) and (eq 6) is the normal dynamic linear model:
where At and Bt are nonrandom scalars or matrices of appropriate dimension, and possibly functions of unknown parameters. vt and wt are mean zero normal variates of appropriate dimension, generally assumed independent of each other. The Kalman algorithms (Kalman 1960) are well known procedures for calculating the likelihood and conditional expectations of the state variable, xt, given observations, y1, y2, . . ., ys, where s <, =, or > t. West and Harrison (1989) is a modern reference on normal dynamic linear models and the Kalman algorithms (also see Sullivan (1992) for an abbreviated description).
Brillinger et al. (1980), Sullivan (1992), and Speed (1993) have used SSMs for modeling population abundance over time (see also Schnute (1994) for an exposition on sequential fisheries models). None of these formulations, however, have included a spatial component or dealt with migration.
Models used by the Pacific Salmon Commission for the management of coho salmon harvest in saltwater fisheries do not contain an explicit migration mechanism (Hunter 1985). To estimate the harvest in area a at time t, estimates of historical harvest rates for a and t, with some scaling for changes in harvest effort levels, are applied to estimates of abundance pooled over all areas. All fish not harvested at time t are therefore, according to the models, available to all fisheries at t + 1. As an extreme example, harvest reductions off the coast of California at time t translate into additional fish available to Alaskan fisheries at t + 1. As a contrast to these models, a state-space model for a single stock of coho salmon was developed with an explicit migration mechanism.
Data and temporal-spatial scale
The survival and migration of three different sets of marked hatchery coho, marked and released in the spring of 1984, 1985, and 1986, respectively, were modeled separately. Available data included recoveries of fish marked and released from a hatchery on the west coast of Washington state (Washington Department of Fisheries and Wildlife Humptulips hatchery). After the release in the spring, the salmon spent over a year in the ocean and were caught the following summer in various ocean fisheries. Catches were sampled at known rates and the number of marked recoveries estimated. For management purposes the ocean fisheries are partitioned into several regions and recovery information is aggregated within these regions. The estimated recoveries for a particular group released from this hatchery in 1984 and caught in 1985 are summarized by recovery area and four week periods (beginning sometime in June) and shown in Table 1 along with estimates of the harvest effort, as measured by number of troll boats bringing their catch into port in the United States and by days of fishing by troll boats in Canada. The areas are shown from south (NC=northern California) to north (NBC=northern British Columbia) with the hatchery located adjacent to the catch area labelled GH (GH=Grays Harbor). The Canadian catch areas begin with SVI (southwest Vancouver Island). Some recoveries are made by other types of fisheries but the effort measures are not reflective of these other fisheries (and will introduce additional noise in the modeling).
The temporal scale used in the state-space model was weekly, T=16. The spatial framework was a linearization of the thirteen catch areas between northern California and northern British Columbia. The location of the river leading to the hatchery was set equal to 0, with locations to the south having negative spatial coordinates and those to the north having positive values. This oversimplifies the spatial distribution and omits those fish that move into inside water bodies such as Puget Sound, but for coho salmon from this hatchery, the vast majority of recoveries are in the coastal fisheries.
SSM formulation
The state vector, nt, is the unknown abundance in each catch area and the observation vector, ct, is the corresponding estimated number of tag recoveries. An auxiliary vector used in both the state and observation equations to model mortality is fishing effort. In matrix notation, the state and observation equations:
Ht is a diagonal harvest matrix containing mortality due to fishing. Mt and St again represent movement and survival, respectively, and the covariance matrices for vt and wt follow the formulation of
given in Section 3.
The abundance to the initial point in time t = 0 was binomial (R,
) where R is the number of fish tagged and released from the hatchery and
is the survival rate up to time 0. The density for initial location, f(p0), was a truncated gamma distribution scaled to fit the length of the catch areas. The shape parameter was fixed at 2.0, leaving the location parameter to be estimated,
p0 ~ Gamma (
, 2.0).
To calculate the likelihood with the Kalman filter, the initial abundance vector, n0, for a given set of parameter values was fixed at the expected values, namely, R
times the probabilities for each of the thirteen locations.
The diagonal survival matrix, St, was a function of natural and fishing mortality, with different scaling used for the different U.S. and Canadian effort measures. The probability of survival,
where
and
are parameters corresponding to natural and fishing mortality rates, respectively. Different parameter values for
were allowed for US and Canadian fisheries. The corresponding diagonal of the harvest matrix:
Scaling of effort and parameters is not shown in the above expressions and is the reason for approximation rather than equality.
Movement (mt) was modeled as a random walk with unequal probability of direction and variable step sizes. As time increased and distance from hatchery increased, the probability of moving back to the hatchery increased. Letting
t be an indicator variable for movement toward the hatchery:
Step size was modeled as a gamma random variable,

where the location and scale parameters depended upon direction
t. Conditional on moving toward the hatchery, the probability of a larger step increased as time and distance increased,


while conditional on moving away from the hatchery, the probability of a larger step decreased with time and distance,


A unique wrinkle to the modeling of salmon is that most survivors will eventually try to return to the natal area. Some return to the hatchery, and the remainder either spawn naturally in the wild or are caught by in-river fisheries. Recoveries in the natal areas must be treated differently than ocean recoveries; if they arrive at the hatchery, they are with high probability recovered. To 'finish' the modeling period, natal area recoveries beyond a 'late' point in time, week 14, were aggregated into the last week of the modeling period, week 16, natal area effort during that period was set equal to zero, and the harvest rate in the natal area was fixed at 99.9%. The underlying assumption of this approach is that all the remaining fish that would return to the natal area had returned by week 16.
Results
The Kalman filter algorithm was used to calculate the likelihood for the vector
based on (eq 9) and (eq 10) for salmon releases from each of the three different years. The nonlinear optimization program NPSOL (Gill, et al. 1986) was used to estimate the parameters (Table 2).
Estimates of the step size parameter
reached the lower bound of 0.10 for 1986 and 1987. The initial survival rate parameter
and the natural mortality rate parameter
were essentially unidentifiable, e.g., increases in the initial survival rate simply translated into increases in the natural mortality. The results in Table 2 were based on
arbitrarily set equal to zero.
Filtered estimates of abundance by area were plotted by week and began, of course, as a spatially smooth distribution that became increasingly irregular as catches were removed at differential rates by area. At the end of the 16 week period most of the survivors were concentrated in the natal area, but there were still some fish remaining at sea.
The initial survival estimates
can be compared to simple cohort analysis estimates assuming no natural mortality, the sum of catches and hatchery returns divided by release size (Table 3). The SSM estimates exceeded the simple estimates every year. Partially this is due to the SSM leaving some fish still at sea at the end of modeling period, extending the modeling period to 18 weeks allows more fish to return to the natal area and
decreases. Another explanation is the incidental fishing mortality of fish fatally hooked or netted but not landed which simple cohort analysis does not account for.
The initial location parameter estimate,
, is larger for 1985 than 1986 and 1987, indicating a more northerly distribution for 1985. This is consistent with the catch recovery information. For 1985 only 4% of the non-natal area recoveries were in areas south of the natal area, while the results for 1986 and 1987 were 15% and 19%, respectively. This comparison is confounded somewhat by increases in relative effort in 1986 and 1987, but the magnitude of effort increase was not as large as recovery increase.
The fishing mortality rates for both the U.S. and Canadian fisheries as a function of effort were calculated for a range of effort levels and found to be within reason (30 to 40% at most within a given unit of area).
The movement rates toward the natal area can be estimated from the expected step size. After making corrections for the initial scaling of catch area locations, the expected step sizes per week toward the natal area were 140 miles for 1985 and around 100 miles for 1986 and 1987.
More complex submodels were tried as well, but sparse recovery matrices led to more instability in the estimates; additional details and data are available in Newman (1993).
Animal survival and migration was approached at the individual animal level with a decomposition into the three components of initial location, survival, and movement. Various explanations for individual animal movement and mortality can be postulated for each submodel. Using observations about actual behavior, e.g., from telemetry data, statistical tests can be used to compare different theories.
State-space models and the Kalman algorithms are a useful means of estimating parameters and testing hypotheses when exact information is unavailable. Given historical parameter estimates, managers of exploited populations could use SSMs to predict the impact of various harvesting plans. Depending upon the temporal resolution of the harvest season, inseason management changes could be made using currently available harvest information and the Kalman filter.
The application to salmon given here is arguably an improvement on currently used harvest management models in that unreasonable catch transfers between extreme regions are unlikely. On the other hand, the SSM formulation for the available data is highly model driven. A more data driven application would be to use the type of mark and recovery data used by Hilborn (1990) and Schwarz et al. (1993), for example, which was based on spatially-temporally disjoint release and recovery sites. Incorporating available ocean environmental data, such as sea surface temperature, could prove useful as well in explaining, and perhaps predicting, parameters relating to initial spatial location.
This work was supported with funds from the U.S. Fish and Wildlife Service. I thank Richard Comstock of U.S. Fish and Wildlife Service, Dick O'Connor of Washington Department of Fisheries and Wildlife, Jim Longwill of Pacific States Marine Fisheries Commission, and Susan Bates of Canadian Department of Fisheries and Oceans for their assistance in providing the data.
Anderson-Sprecher, R., and Ledolter, J. (1991.) State-space analysis of wildlife telemetry data. Journal of the American Statistical Association 86:596-602.
Brillinger, D., Guckenheimer, J., Guttorp, P., and Oster, G. (1980.) Empirical modelling of population time series data: the case of age and density dependent vitality rates. Lectures on mathematics in the life sciences 13:65-90.
Gill, P.E., Murray, W., Saunders, M.A., and Wright M.H. (1986.) User's guide for NPSOL (Version 4.0): a Fortran package for nonlinear programming. Technical Report SOL 86-2, Systems Optimization Laboratory, Department of Operations Research, Stanford, Stanford, California.
Hilborn, R. (1990.) Determination of fish movement patterns from tag recoveries using maximum likelihood estimators. Canadian Journal of Fisheries and Aquatic Science 47:635-643.
Hunter, M.A. (1985.) The 1976-1978 brood coho model. Technical Report 222, Washington Department of Fisheries Progress Report, Olympia, Washington.
Kalman, R.E. (1960.) A new approach to linear filtering and prediction problems. Transactions ASME Journal of Basic Engineering 82:35-45.
Newman, K.B. (1993.) State-space modeling of salmon migration and Monte Carlo alternatives to the Kalman filter. Ph.D. dissertation, University of Washington, Seattle, Washington.
Schnute, J. (1994.) A general framework for developing sequential fisheries models. Canadian Journal of Fisheries and Aquatic Sciences 51:1676-1688.
Schwarz, C.J., Schweigert, J.F., and Arnason, A.N. (1993.) Estimating migration rates using tag-recovery data. Biometrics 49:177-193.
Speed, T. (1993.) Modelling and managing a salmon population. In Statistics for the Environment, V. Barnett and K. F. Turkman (eds), 267{292. New York: John Wiley and Sons Ltd.
Sullivan, P. (1992.) A Kalman filter approach to catch-at-length analysis. Biometrics 48:237-257.
| Table 1: Estimated recoveries and troll fishing effort in 1985 for a coho release group (Tag code 632861) summarized by thirteen catch areas and four week periods. | ||||||||
|---|---|---|---|---|---|---|---|---|
| Area/Time | Estimated Recoveries | Troll effort | ||||||
| 1-4 | 5-8 | 9-12 | 13-16 | 1-4 | 5-8 | 9-12 | 13-16 | |
| NC | 0 | 0 | 0 | 0 | 1 | 3 | 1 | 0 |
| Br | 0 | 0 | 0 | 0 | 14 | 30 | 47 | 58 |
| CB | 0 | 0 | 0 | 0 | 3164 | 3134 | 1223 | 439 |
| Np | 1 | 0 | 0 | 0 | 910 | 653 | 133 | 79 |
| Tl | 0 | 2 | 2 | 0 | 481 | 436 | 140 | 84 |
| As | 0 | 3 | 0 | 0 | 0 | 365 | 1 | 0 |
| GH | 8 | 3 | 0 | 3581 | 1014 | 36 | 0 | 0 |
| Qu | 0 | 0 | 0 | 0 | 576 | 156 | 32 | 0 |
| CF | 5 | 3 | 7 | 0 | 202 | 1043 | 237 | 2 |
| SVI | 16 | 30 | 19 | 7 | 9129 | 12,918 | 10,078 | 2871 |
| NVI | 21 | 51 | 11 | 2 | 6730 | 7779 | 4230 | 1036 |
| SBC | 0 | 11 | 1 | 0 | 1789 | 2465 | 2457 | 58 |
| NBC | 0 | 0 | 0 | 0 | 8932 | 10,900 | 9411 | 4011 |
| 1 All recoveries in natal area are included. |
| 1 Parameter estimate at lower bound. |