Robust affordable means for estimating HIV incidence at the population level (as opposed to in study cohorts, which is easy) continue to be elusive and much desired. A major thrust of research in recent years has been the development of theory and appropriate biomarker assays which would support the estimation of incidence from cross-sectional estimates of the frequency/ ‘prevalence’ of ‘recent infection’ – suitably defined (1).
As resources are invested in developing more appropriate tests for recent infection, the methods for evaluating and optimising the performance of candidate tests need to be further refined. We have recently explored a range of statistical methods for estimating the mean duration of intection (MDRI) by investigating an unusually good data set from Harare, Zimbabwe where the BED Capture Enzyme Immuno-Assay (BED-CEIA or simply BED assay) was applied (2). In this case, ‘recent infection’ is taken to be an optical density less than a chosen cut-off – indicating non-matured levels of proportion of HIV specific IgG (an important antibody class). An estimate of the MDRI of the assay is thus obtained by estimating the time spent below the optical density cut-off C, for patients who have been HIV positive for at most T , for some pre-defined time T.(1,3) We investigated whether there is an optimum way of estimating the mean recency duration or whether several estimating procedures provide similar answers and in that case whether simple approaches provide adequate answers. We also ask how estimates of the mean recency duration and incidence are affected by our choice of cut-off and whether these effects differ with our choice of estimation method.
Example MDRI estimation
The MDRI was estimated using BED data from the Zimbabwe Vitamin A for Mothers and Babies (ZVITAMBO) Trial. 14 110 women were tested for HIV postpartum, at 6 weeks, and at 3 monthly intervals from 3 to 24 months. MDRI was estimated using subsets of BED optical density (OD) data from 353 seroconverted women of the original 9 562 HIV negative women.
MDRI (ωt) was estimated using four different estimation methods:
- Proportion of recent infections among seroconverters (r/s). Assuming uniformly distributed seroconversion events over the time period [0,T], the MDRI is estimated from the proportion of all seroconverters testing recent according to some OD cut-off.
- Linear Mixed Model (LMM). A LMM with fixed and random effects is used to model a linear relationship between the OD and time since infection (transformed according to the recommendation of the developer). This model yields a straight line for each woman, from which the estimated time spent in the recent state is obtained by using an inverse prediction technique (4), with the upper limit restricted to T. Bootstrap techniques were applied to these individual estimates to obtain the final estimate of the MDRI as well as the associated confidence interval.
- Non-linear Mixed Model (NLMM). A more biologically plausible NLMM is used to model the relationship between the assay OD and the time since infection. Unlike the LMM, this function approaches finite asymptotes for short and long times since infection. For this model, time of seroconversion is assumed to be uniformly distributed between the dates of last negative and first positive HIV tests, and individual recency durations were obtained by using an inverse prediction technique (4), with the maximum set as T. Markov Chain Monte Carlo (MCMC) methods are applied to obtain a distribution for the MDRI.
- Survival Analysis (SA). For this method, the time of seroconversion for each individual is approximated by the mid-point between the time of last negative and time of first positive HIV test. An interval for the time to reach the pre-defined OD cut-off is obtained from the data by using the last time point with OD below the cut-off and the first time point with OD above the cut-off. The data are then considered to be interval censored and Turnbull’s modification of the Product-Limit Estimator is used to obtain a survival function which, when integrated over [0, T], provides an estimate of MDRI and its corresponding confidence intervals.
The estimates of the MDRI produced by the above four estimation methods are provided in Table 1. The optical density cut-off was fixed at the ‘package insert’ value of 0.8 for all methods, a minimum of two samples per case were required and the maximum allowable time between the last negative and first positive HIV tests was 120 days. This resulted in a sample size of 100 women.
Table 1 Results of MDRI estimation using different methods.
|Method||Mean recency duration (95% CI) (days)||Coefficient of variation (%)|
|i.||Ratio r/s||192 (168-216)||6.4|
|iv.||Survival analysis||192 (168-216)||6.4|
There are no significant differences between the four estimates when the OD cut-off C lies between 0.6 and 1.0. Estimates are fairly insensitive to varying the minimum allowable samples per women between 2 and 4 and varying the maximum number of days between last negative and first positive HIV test between 60 and 180. In all cases, the coefficient of variation (the ratio of the standard error to the mean) is the lowest for the NLMM.
The 12 months post-partum follow-up incidence of 3.46% (95% CI: 3.05%–3.87%) was compared with BED incidence calculated using only women who seroconverted during the year. For all values of C tested, the NLMM estimates of incidence showed the smallest deviation from the follow-up estimate, varying only between 3.23% and 3.50%. For values of C between 0.8 and 1, the estimates of incidence for the NLMM, LMM and SA methods did not differ significantly from each other. The r/s method was not used, since r/s is a constant multiple of the follow-up incidence.
Which estimator and cut-off point to choose?
For a cut-off C = 0.8, there was no significant difference between MDRI estimates, but coefficients of variation (CoVs) were higher for the r/s and SA estimators and lower for regression estimators, LMM and NLMM, which use information on time-dependent increases in OD. NLMM, additionally, uses a biologically plausible function that best fits the data approaching finite asymptotes for small and large times since infection. NLMM estimates of MDRI changed most regularly with C and produced HIV incidence estimates most closely approximating observed follow-up incidence. Regression works less well when only 2-3 BED results are available for all cases: the r/s and SA methods are then preferable.
With respect to the cut-off point: As C (and thus MDRI estimate) decreases, the number of observed recent infections declines, and the CoV of the MDRI estimate increases: as C increases so does the false-recent rate, and there is an increasing risk of violating assumptions of constant incidence. For NLMM, CoVs were 2.8% at C = 0.4 and 1.8% at C = 1.0: the conventional value of C = 0.8 combines acceptable CoV (2.0%), MDRI (~0.5 years) and false recent rate (~5%) estimates.
The ZVITAMBO dataset was an extraordinary data set, but usually developers of assays and analysis methods need to work with less ideal data sets. New studies, usually following seroconverters for other reasons, will continue to be occasionally used as sources for additional specimens for analysis of this type. BED per se was once widely used, but because of its limitations is by no means the expected dominant assay for future investigations. Crucially, the current trend is to use multiple biomarkers, each of which has some time evolution within newly infected individuals, and to combine these into a recent/non recent categorisation.
So – moving forward, we need to know how variations in type and quality and quantity of data impact on analyses of this type , so that we can adjudicate the difference in performance of the various statistical approaches. There is indeed a variety of techniques, including systematic benchmarking of methods on simulated data where we can be sure what the ‘real’ answer is. SACEMA is also involved in a number of key collaborations where such methods are being applied to numerous assays on well characterised specimen panels, for example the Consortium for the Evaluation and Performance of HIV Incidence Assays (CEPHIA).