Short item Published on March 13, 2014

Defining optimality of a test for recent infection for HIV incidence surveillance

SACEMA workers have just published a conceptual analysis of the notion of ‘test optimisation’ in a surveillance context, which should help clarify some persistent confusion that has hindered discourse in this area for years (1).

A recurring, almost central, theme in the development of laboratory assays is how to make minor adjustments to details of the procedure and the interpretation of the results, so that the ‘best possible’ test is obtained from the underlying chemical/physical process which have been deployed. This naturally forces us to ask what we mean by ‘good’, ‘better’ and ‘best’ test ‘performance’, and indeed this is not necessarily a simple matter.

When tests are designed to assess the condition of individuals, and the test results are used to make decisions around clinical management, there are at least some commonly used concepts, and some widely held useful notions about how to define test performance. The most familiar are the idea of test sensitivity (the ability of a test not to let real cases of the condition slip through undetected) and the specificity (the ability of a test to avoid false positive results). Even here there are subtleties, especially when dealing with some famously difficult-to-diagnose conditions, like tuberculosis (TB). However, there is an established tendency to use these ideas as building blocks in a way in which most workers in the field can have a common understanding about, and then to make some useful assessments of test utility. The analysis typically broadens to include such considerations as the frequency/prevalence of the sought condition in the context of use, and the consequences (qualitative and quantitative) of producing either false positive or false negative results – both of which are inevitable, even if infrequent.

SACEMA has been very active in refining the use of laboratory tests for identifying recent infection (in particular of HIV) for the purpose of estimating disease incidence i.e. the rate of occurrence of new infections. In this application, to surveillance rather than clinical management, there are very different considerations. Although there has been much confusion about this point, stemming largely from trying to recycle ideas appropriate to diagnostics, it is actually a much simpler matter than evaluating the utility of a diagnostic procedure. This because the only thing that matters is the quality of the estimate of the relevant disease incidence, and there are no complex issues around the consequences of incorrect clinical management of patients.

In the end, what emerges from a detailed analysis can be summarised as follows:

Firstly, whether a test yields biased or unbiased estimates of incidence does not depend on the test per se, but rather on whether its properties are well characterised. This is a problem of ‘test calibration’ in a sense that is slightly wider than the usual meaning of calibration in a narrow laboratory setting where people usually mean something like a standard curve for reading off analysed concentrations.

Secondly, and this is the crucial punch line – the precision (i.e. narrowness of confidence bounds) of an incidence estimate based on the use of a cross sectional applied test for recent infection is indeed based on the properties of the test itself. Importantly, the precision of a test can be unambiguously calculated, as long as one is willing to specify the context of intended use, and so there need be no detours into subjective/qualitative considerations like patient management, etc. Routine approaches can now be used for the optimising of test procedures, to minimise a directly calculated value of statistical uncertainty associated with the surveillance use of the test.

It may still take some time for widespread uptake of this simple notion of optimality. In particular, there are signs that people are uncomfortable with the inevitable context dependence of optimality, but actually this has always been known to be the case with any diagnostic test too. Therefore we expect eventual wide acceptance of this idea.