American Journal of Epidemiology Advance Access originally published online on October 9, 2008
American Journal of Epidemiology 2008 168(11):1326-1332; doi:10.1093/aje/kwn249
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PRACTICE OF EPIDEMIOLOGY |
Validation of a Hierarchical Deterministic Record-Linkage Algorithm Using Data From 2 Different Cohorts of Human Immunodeficiency Virus-Infected Persons and Mortality Databases in Brazil
Correspondence to Dr. Antonio Guilherme Fonseca Pacheco, Programa de Computação Científica, Fundação Oswaldo Cruz, Avenida Brasil, 4365, Manguinhos, 21045-360, Rio de Janeiro, Brazil (e-mail: apacheco{at}fiocruz.br).
Received for publication March 11, 2008. Accepted for publication July 21, 2008.
Loss to follow-up is a major source of bias in cohorts of patients with human immunodeficiency virus (HIV) and could lead to underestimation of mortality. The authors developed a hierarchical deterministic linkage algorithm to be used primarily with cohorts of HIV-infected persons to recover vital status information for patients lost to follow-up. Data from patients known to be deceased in 2 cohorts in Rio de Janeiro, Brazil, and data from the Rio de Janeiro State mortality database for 1999–2006 were used to validate the algorithm. A fully automated procedure yielded a sensitivity of 92.9% and specificity of 100% when no information was missing. When the automated procedure was combined with clerical review, in a scenario of 5% death prevalence and 20% missing mothers names, sensitivity reached 96.5% and specificity 100%. In a practical application, the algorithm significantly increased death rates and decreased the rate of loss to follow-up in the cohorts. The finding that 23.9% of matched records did not give HIV or acquired immunodeficiency syndrome as the cause of death reinforces the need to search all-cause mortality databases and alerts for possible underestimation of death rates. These results indicate that the algorithm is accurate enough to recover vital status information on patients lost to follow-up in cohort studies.
cohort studies; data collection; HIV; medical record linkage; mortality; software validation
Abbreviations: AIDS, acquired immunodeficiency syndrome; CI, confidence interval; HIV, human immunodeficiency virus; ICD-10, International Classification of Diseases, Tenth Revision; NPV, negative predictive value; PPV, positive predictive value; THRio, TB-HIV in Rio