Loading…

Mitigating Bias in Radiology Machine Learning: 1. Data Handling

Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can b...

Full description

Saved in:
Bibliographic Details
Published in:Radiology. Artificial intelligence 2022-09, Vol.4 (5), p.e210290
Main Authors: Rouzrokh, Pouria, Khosravi, Bardia, Faghani, Shahriar, Moassefi, Mana, Vera Garcia, Diana V, Singh, Yashbir, Zhang, Kuan, Conte, Gian Marco, Erickson, Bradley J
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043
cites cdi_FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043
container_end_page
container_issue 5
container_start_page e210290
container_title Radiology. Artificial intelligence
container_volume 4
creator Rouzrokh, Pouria
Khosravi, Bardia
Faghani, Shahriar
Moassefi, Mana
Vera Garcia, Diana V
Singh, Yashbir
Zhang, Kuan
Conte, Gian Marco
Erickson, Bradley J
description Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the suboptimal practices and steps to prevent them. Data Handling, Bias, Machine Learning, Deep Learning, Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD) © RSNA, 2022.
doi_str_mv 10.1148/ryai.210290
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9533091</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2723156431</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043</originalsourceid><addsrcrecordid>eNpVkE1Lw0AQhhdRbKk9eZccBUmd_UpTD4rWjwotguh5me5O2pU0qdlU6L83pbXUy8yw-_DO8DB2zqHHuUqvqzX6nuAgBnDE2iKRaZxwgOODucW6IXwBgOBKaQGnrCUTAUor1WZ3E1_7Gda-mEUPHkPki-gdnS_zcraOJmjnvqBoTFgVDXIT8V70iDVGIyxc3rycsZMM80DdXe-wz-enj-EoHr-9vA7vx7GVKa9jrhVCmkoBgoj3ndYECDJrqgXHVaYsJVqnXFiXKOTkiNABOpoKSEDJDrvd5i5X0wU5S0VdYW6WlV9gtTYlevP_p_BzMyt_zEBLCQPeBFzuAqrye0WhNgsfLOU5FlSughF9IblOlNygV1vUVmUIFWX7NRzMxrrZWDdb6w19cXjZnv1zLH8BXDR9Ag</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2723156431</pqid></control><display><type>article</type><title>Mitigating Bias in Radiology Machine Learning: 1. Data Handling</title><source>PubMed Central</source><creator>Rouzrokh, Pouria ; Khosravi, Bardia ; Faghani, Shahriar ; Moassefi, Mana ; Vera Garcia, Diana V ; Singh, Yashbir ; Zhang, Kuan ; Conte, Gian Marco ; Erickson, Bradley J</creator><creatorcontrib>Rouzrokh, Pouria ; Khosravi, Bardia ; Faghani, Shahriar ; Moassefi, Mana ; Vera Garcia, Diana V ; Singh, Yashbir ; Zhang, Kuan ; Conte, Gian Marco ; Erickson, Bradley J</creatorcontrib><description>Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the suboptimal practices and steps to prevent them. Data Handling, Bias, Machine Learning, Deep Learning, Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD) © RSNA, 2022.</description><identifier>ISSN: 2638-6100</identifier><identifier>EISSN: 2638-6100</identifier><identifier>DOI: 10.1148/ryai.210290</identifier><identifier>PMID: 36204544</identifier><language>eng</language><publisher>United States: Radiological Society of North America</publisher><subject>Special Report</subject><ispartof>Radiology. Artificial intelligence, 2022-09, Vol.4 (5), p.e210290</ispartof><rights>2022 by the Radiological Society of North America, Inc.</rights><rights>2022 by the Radiological Society of North America, Inc. 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043</citedby><cites>FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043</cites><orcidid>0000-0002-0520-3582 ; 0000-0003-4664-0751 ; 0000-0002-5848-7072 ; 0000-0001-5926-7517 ; 0000-0001-7926-6095 ; 0000-0002-8024-339X ; 0000-0003-1014-4975</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9533091/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9533091/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36204544$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rouzrokh, Pouria</creatorcontrib><creatorcontrib>Khosravi, Bardia</creatorcontrib><creatorcontrib>Faghani, Shahriar</creatorcontrib><creatorcontrib>Moassefi, Mana</creatorcontrib><creatorcontrib>Vera Garcia, Diana V</creatorcontrib><creatorcontrib>Singh, Yashbir</creatorcontrib><creatorcontrib>Zhang, Kuan</creatorcontrib><creatorcontrib>Conte, Gian Marco</creatorcontrib><creatorcontrib>Erickson, Bradley J</creatorcontrib><title>Mitigating Bias in Radiology Machine Learning: 1. Data Handling</title><title>Radiology. Artificial intelligence</title><addtitle>Radiol Artif Intell</addtitle><description>Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the suboptimal practices and steps to prevent them. Data Handling, Bias, Machine Learning, Deep Learning, Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD) © RSNA, 2022.</description><subject>Special Report</subject><issn>2638-6100</issn><issn>2638-6100</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpVkE1Lw0AQhhdRbKk9eZccBUmd_UpTD4rWjwotguh5me5O2pU0qdlU6L83pbXUy8yw-_DO8DB2zqHHuUqvqzX6nuAgBnDE2iKRaZxwgOODucW6IXwBgOBKaQGnrCUTAUor1WZ3E1_7Gda-mEUPHkPki-gdnS_zcraOJmjnvqBoTFgVDXIT8V70iDVGIyxc3rycsZMM80DdXe-wz-enj-EoHr-9vA7vx7GVKa9jrhVCmkoBgoj3ndYECDJrqgXHVaYsJVqnXFiXKOTkiNABOpoKSEDJDrvd5i5X0wU5S0VdYW6WlV9gtTYlevP_p_BzMyt_zEBLCQPeBFzuAqrye0WhNgsfLOU5FlSughF9IblOlNygV1vUVmUIFWX7NRzMxrrZWDdb6w19cXjZnv1zLH8BXDR9Ag</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Rouzrokh, Pouria</creator><creator>Khosravi, Bardia</creator><creator>Faghani, Shahriar</creator><creator>Moassefi, Mana</creator><creator>Vera Garcia, Diana V</creator><creator>Singh, Yashbir</creator><creator>Zhang, Kuan</creator><creator>Conte, Gian Marco</creator><creator>Erickson, Bradley J</creator><general>Radiological Society of North America</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0520-3582</orcidid><orcidid>https://orcid.org/0000-0003-4664-0751</orcidid><orcidid>https://orcid.org/0000-0002-5848-7072</orcidid><orcidid>https://orcid.org/0000-0001-5926-7517</orcidid><orcidid>https://orcid.org/0000-0001-7926-6095</orcidid><orcidid>https://orcid.org/0000-0002-8024-339X</orcidid><orcidid>https://orcid.org/0000-0003-1014-4975</orcidid></search><sort><creationdate>20220901</creationdate><title>Mitigating Bias in Radiology Machine Learning: 1. Data Handling</title><author>Rouzrokh, Pouria ; Khosravi, Bardia ; Faghani, Shahriar ; Moassefi, Mana ; Vera Garcia, Diana V ; Singh, Yashbir ; Zhang, Kuan ; Conte, Gian Marco ; Erickson, Bradley J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Special Report</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rouzrokh, Pouria</creatorcontrib><creatorcontrib>Khosravi, Bardia</creatorcontrib><creatorcontrib>Faghani, Shahriar</creatorcontrib><creatorcontrib>Moassefi, Mana</creatorcontrib><creatorcontrib>Vera Garcia, Diana V</creatorcontrib><creatorcontrib>Singh, Yashbir</creatorcontrib><creatorcontrib>Zhang, Kuan</creatorcontrib><creatorcontrib>Conte, Gian Marco</creatorcontrib><creatorcontrib>Erickson, Bradley J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Radiology. Artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rouzrokh, Pouria</au><au>Khosravi, Bardia</au><au>Faghani, Shahriar</au><au>Moassefi, Mana</au><au>Vera Garcia, Diana V</au><au>Singh, Yashbir</au><au>Zhang, Kuan</au><au>Conte, Gian Marco</au><au>Erickson, Bradley J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating Bias in Radiology Machine Learning: 1. Data Handling</atitle><jtitle>Radiology. Artificial intelligence</jtitle><addtitle>Radiol Artif Intell</addtitle><date>2022-09-01</date><risdate>2022</risdate><volume>4</volume><issue>5</issue><spage>e210290</spage><pages>e210290-</pages><issn>2638-6100</issn><eissn>2638-6100</eissn><abstract>Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the suboptimal practices and steps to prevent them. Data Handling, Bias, Machine Learning, Deep Learning, Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD) © RSNA, 2022.</abstract><cop>United States</cop><pub>Radiological Society of North America</pub><pmid>36204544</pmid><doi>10.1148/ryai.210290</doi><orcidid>https://orcid.org/0000-0002-0520-3582</orcidid><orcidid>https://orcid.org/0000-0003-4664-0751</orcidid><orcidid>https://orcid.org/0000-0002-5848-7072</orcidid><orcidid>https://orcid.org/0000-0001-5926-7517</orcidid><orcidid>https://orcid.org/0000-0001-7926-6095</orcidid><orcidid>https://orcid.org/0000-0002-8024-339X</orcidid><orcidid>https://orcid.org/0000-0003-1014-4975</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2638-6100
ispartof Radiology. Artificial intelligence, 2022-09, Vol.4 (5), p.e210290
issn 2638-6100
2638-6100
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9533091
source PubMed Central
subjects Special Report
title Mitigating Bias in Radiology Machine Learning: 1. Data Handling
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T14%3A16%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20Bias%20in%20Radiology%20Machine%20Learning:%201.%20Data%20Handling&rft.jtitle=Radiology.%20Artificial%20intelligence&rft.au=Rouzrokh,%20Pouria&rft.date=2022-09-01&rft.volume=4&rft.issue=5&rft.spage=e210290&rft.pages=e210290-&rft.issn=2638-6100&rft.eissn=2638-6100&rft_id=info:doi/10.1148/ryai.210290&rft_dat=%3Cproquest_pubme%3E2723156431%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-154a0883202ee17d55e0a03fe0ac0d14f4ce655812cd64a1edeead0adeb206043%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2723156431&rft_id=info:pmid/36204544&rfr_iscdi=true