Loading…

Minmaxing of Bayesian Improved Surname Geocoding and Geography Level Ups in Predicting Race

Racial identification is a critical factor in understanding a multitude of important outcomes in many fields. However, inferring an individual’s race from ecological data is prone to bias and error. This process was only recently improved via Bayesian improved surname geocoding (BISG). With surname...

Full description

Saved in:
Bibliographic Details
Published in:Political analysis 2022-07, Vol.30 (3), p.456-462
Main Authors: Clark, Jesse T., Curiel, John A., Steelman, Tyler S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Racial identification is a critical factor in understanding a multitude of important outcomes in many fields. However, inferring an individual’s race from ecological data is prone to bias and error. This process was only recently improved via Bayesian improved surname geocoding (BISG). With surname and geographic-based demographic data, it is possible to more accurately estimate individual racial identification than ever before. However, the level of geography used in this process varies widely. Whereas some existing work makes use of geocoding to place individuals in precise census blocks, a substantial portion either skips geocoding altogether or relies on estimation using surname or county-level analyses. Presently, the trade-offs of such variation are unknown. In this letter, we quantify those trade-offs through a validation of BISG on Georgia’s voter file using both geocoded and nongeocoded processes and introduce a new level of geography—ZIP codes—to this method. We find that when estimating the racial identification of White and Black voters, nongeocoded ZIP code-based estimates are acceptable alternatives. However, census blocks provide the most accurate estimations when imputing racial identification for Asian and Hispanic voters. Our results document the most efficient means to sequentially conduct BISG analysis to maximize racial identification estimation while simultaneously minimizing data missingness and bias.
ISSN:1047-1987
1476-4989
DOI:10.1017/pan.2021.31