Loading…

Analyzing missingness patterns in real-world data using the SMDI toolkit: application to a linked EHR-claims pharmacoepidemiology study

Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely m...

Full description

Saved in:
Bibliographic Details
Published in:BMC medical research methodology 2024-10, Vol.24 (1), p.246-14, Article 246
Main Authors: Raman, Sudha R, Hammill, Bradley G, Shaw, Pamela A, Lee, Hana, Toh, Sengwee, Connolly, John G, Dandreo, Kimberly J, Nalawade, Vinit, Tian, Fang, Liu, Wei, Li, Jie, Hernández-Muñoz, José J, Glynn, Robert J, Desai, Rishi J, Weberpals, Janick
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example. We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015-2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications. High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies. We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.
ISSN:1471-2288
1471-2288
DOI:10.1186/s12874-024-02330-2