Loading…

A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks

Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reporte...

Full description

Saved in:
Bibliographic Details
Published in:Scientific reports 2018-06, Vol.8 (1), p.8776-8, Article 8776
Main Authors: Schneider, John, Schumm, L. Philip, Fraser, Maya, Yeldandi, Vijay, Liao, Chuanhong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reported by different individuals using cell phone numbers as unique identifiers. This method was then used to evaluate the performance of using reported names and demographic characteristics to infer overlap. Cell-phone numbers, names and demographic data for a sample of high-risk men in India and their contacts were collected using a novel, hybrid instrument involving both cell-phone data extraction and Computer-Assisted Personal Interviewing (CAPI). Logistic regression was used to model the probability that a pair of contacts reported by different respondents were identical, based on the correspondence between their reported names and attributes. A discrete mixture model is proposed which provides predictions nearly as good as the logistic model but may be used in a new population without re-calibration. Despite achieving AUCs of 0.83–0.86, the low rate of true overlap among a very large number of contact pairs still results in a high rate of false positives. Next generation contact tracing calls for more archived or digital matching processes.
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-018-26794-7