Descriptive studies in epidemiology give baseline information on the spatial, temporal, and demographic patterns of disease burden. This information can help epidemiologists and public health professionals design studies to learn more about the cause of such patterns, design intervention studies to assess disease control measures, or target effective control measures that are most efficient at a large scale. Yet epidemiological data is plagued by bias. Bias is defined as a difference between a measured quantity and the true quantity that an investigator is trying to measure. When unaccounted for, bias can lead to incorrect conclusions and consequently have adverse impacts on public health. Bias is omnipresent and no good epidemiological study is complete without proper consideration of what biases can and cannot be accounted for. As such, epidemiologists are trained to spot the potential for bias and avoid it when designing a study, analysing data, and interpreting results.
Opportunistic surveillance data are intrinsically biased because individuals who are more likely to be recorded may not be representative of the population. This causes two problems. Firstly, the case counts for a disease will be an underestimate of the true number of cases. Secondly, any inferences made regarding the disease in the sampled population may not reflect what is going on in the entire population. For instance, hospital based counts of people sick with malaria do not include individuals who never went to the hospital. So the case count will be an underestimate. Furthermore, if healthier individuals are more likely to go to the hospital when sick, then the estimated case fatality rate may underestimate the true case fatality rate. While getting a representative sample of people is difficult, getting representative samples of wildlife can be even more challenging because wild animals are elusive and difficult to handle.
Role of anthrax: natural regulating force?
Consider the situation in Etosha National Park, Namibia, where the plains zebra (Equus quagga) and other herbivores experience outbreaks of anthrax. Anthrax is a disease caused by the bacterium, Bacillus anthracis, which has a peculiar strategy for an infectious agent. Rather than being transmitted directly between host animals or via vectors, B. anthracis lies in the soil waiting to be consumed by a susceptible mammalian host and enter its bloodstream, upon which it awakes, rapidly reproduces, and produces toxins that kill its host in days. The host’s blood, full of B. anthracis, then spills back into the environment where the bacteria return to their dormant spore form waiting for months to decades for the next unsuspecting host to come along. Anthrax is usually an epidemic disease—it affects livestock and wildlife in big outbreaks several years apart.
In Etosha, however, anthrax is endemic; it occurs every year, throughout the year but with seasonal peaks in incidence. Zebra are the primary host species affected by anthrax in Etosha, though elephant, springbok, and other antelope also experience the disease. While in some years over a hundred zebra carcasses are found, anthrax is generally not considered to be a problem in Etosha but rather a natural regulating force in the ecosystem, similar to lions that regulate the populations of their prey. As zebra populations seem reasonably stable in the park, this may well be the case. However, the extent to which anthrax plays this role remains poorly understood. When one hundred carcasses are reported, how do we know whether this represents 150, 250, or even 500 zebra dying annually of the ~13,000 zebra living in Etosha? We can certainly be sure that we did not see all the carcasses. Estimating how many carcasses there really were will help us understand the extent to which anthrax regulates zebra and other host populations. Furthermore, it will assist us in understanding how anthrax affects other elements of the ecosystem: the predators and scavengers that consume zebra and scavenge their carcasses, the grasses zebra feed on, and the other herbivores that compete with zebra for resources.
To answer these questions we must come up with a better way to estimate anthrax incidence from available information. By gathering data about our own surveillance methods, we can use statistical techniques to account for the various processes that determine how many carcasses we actually saw. Statistical models known as hierarchical models allow data from multiple spatial and temporal scales to be integrated in an intuitive fashion to estimate unknown quantities—in our case the incidence of anthrax.
Factors influencing carcasses detection probability
So how are carcasses observed in Etosha? Surveillance is opportunistic as for most wildlife disease systems. Park staff and researchers see carcasses while driving around the park doing other work. Distance of a carcass from the road will therefore be important: carcasses further away are less likely to be detected and vultures are particularly important sighting cues, extending the distance at which carcasses can be sighted (Figure 1).
Figure 1. Vultures landing at carcasses increase the detectability of carcasses for great distances
The common ecological technique called distance sampling allows scientists to estimate the total number of animals in an area by counting animals along line transects (straight paths placed randomly in in the study area). To estimate the total number of animals from the the number observed, one must examine the dropoff in the number observed with increasing distance from the transect (Figure 2).
Figure 2. The ecological technique known as distance sampling involves walking line transects and counting animals sighted as well as measuring the distance of sighted animals from the transect. In this simulated example, the total number of animals (black) are evenly but randomly distributed around the transect. Yet animals further from the transect are detected with lesser probability (red line). This leads more animals being sighted (yellow) closer to the transect. In reality we only have information on the observed animals, and we use statistics to fit the detection probability curve to estimate the total number of animals.
This technique works with carcasses just as well as with live animals. If carcasses are evenly distributed with respect to distance from the road, then there should be as many carcasses far from the road as there are near. So by comparing the numbers found near and far we can estimate the proportion missed. Figure 3 shows how sighted carcasses are distributed with respect from the road.
Figure 3. Sighting cues affect the detectability of carcasses with avian scavengers (primarily vultures; left) increasing sighting distances dramatically compared to mammalian scavengers (black-backed jackals, spotted hyenas, lions; center) and the carcass itself (e.g. zebra; right).
Clearly we are missing more carcasses further from the road, and the types of scavengers present at a carcass affects how far away we can detect carcasses. But should we expect carcasses to be evenly distributed with respect to the road? By putting GPS collars on several zebra, springbok and elephant, we can account for how animals use roads when they are alive to come up with a guess at how carcasses should be distributed near and far from roads. But zebra and other anthrax hosts are highly mobile. Even after accounting for all the above problems, we need to account for the fact that the carcasses are generated by large moving herds of animals that move many kilometres each day. When the herds are in areas of the park with roads, we may see carcasses. But when they are in areas without roads we may not observe any carcasses even if animals are dying at the same rate. Using GPS collar data, we can further examine these general patterns of herd movement to see where our surveillance data is weakest.
Just as location matters, so does time. Carcasses will be difficult to detect even if they are near roads if they are only driven by months after their appearance. Since most carcasses are detected by observing the scavengers rather the carcasses themselves, we put motion sensor camera traps at carcasses to see how long scavenging as a sighting cue lasts for as an index of how long we have to detect that carcass.
For instance, if carcasses are on average visible for only two days after death and we drive each road every four days then our best estimate might be that we saw half of all carcasses near roads, after accounting for distance. Finally, by recording what roads we drive we can keep track of how our effort changes over time and varies in space. We want to be sure that our surveillance data is not merely reflecting variation in our surveillance effort.
The general approach we take, then, is to think about all the factors that affect the probability any carcass is detected (distance from road, time since death, if and when that road was driven, how long carcasses are scavenged for) and then estimate the number of carcasses that were missed. This framework emphasizes thinking about data as being generated both by natural ecological processes, as well as “artificial” observation processes. Keeping track of both how we collect data, in addition to the items of interest themselves, can thereby allow us to tease much more information out of our data as well as avoid biasing our results. And with better estimates of anthrax incidence we can learn more about the role anthrax plays in Etosha and elsewhere.