Loading…
Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience
When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstand...
Saved in:
Published in: | IEEE annals of the history of computing 2022-10, Vol.44 (4), p.71-83 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access. |
---|---|
ISSN: | 1058-6180 1934-1547 |
DOI: | 10.1109/MAHC.2022.3214736 |