Loading…

Random forest classification of etiologies for an orphan disease

Classification of objects into pre‐defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and...

Full description

Saved in:
Bibliographic Details
Published in:Statistics in medicine 2015-02, Vol.34 (5), p.887-899
Main Authors: Speiser, Jaime Lynn, Durkalski, Valerie L., Lee, William M.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3
cites cdi_FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3
container_end_page 899
container_issue 5
container_start_page 887
container_title Statistics in medicine
container_volume 34
creator Speiser, Jaime Lynn
Durkalski, Valerie L.
Lee, William M.
description Classification of objects into pre‐defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation‐specific information. Copyright © 2014 John Wiley & Sons, Ltd.
doi_str_mv 10.1002/sim.6351
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4310784</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3615255631</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3</originalsourceid><addsrcrecordid>eNp1kd9rFDEQx0NR2rMK_QtkwZe-bM2PnWT3pVgOPQvnCXrgY5hmZ9u0u5trcqf2vzdHz6sK5mXC5MOHmXwZOxH8THAu3yY_nGkF4oBNBG9MySXUz9iES2NKbQQcsRcp3XIuBEhzyI4kKJ2PmbB3X3Bsw1B0IVJaF67HlHznHa59GIvQFZQvfbj2lLZMgbkZVze5tD4RJnrJnnfYJ3q1q8ds-eH9cvqxnH-eXU4v5qWDSoqy0bVAoxBEhy1oRxWgybe26XLPIQep4AqQKqE01LUhJahSTipZc-3UMTt_1K42VwO1jsZ1xN6uoh8wPtiA3v79Mvobex2-20oJbuoqC053ghjuN3lXO_jkqO9xpLBJVmiQlagbAxl98w96GzZxzNtlSnOj6_x7T0IXQ0qRuv0wgtttKjanYrepZPT1n8Pvwd8xZKB8BH74nh7-K7JfLz_thDvepzX93PMY72y2GbDfFjO7aJaLuZ6Cnalf5Cqk7Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1660768366</pqid></control><display><type>article</type><title>Random forest classification of etiologies for an orphan disease</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Speiser, Jaime Lynn ; Durkalski, Valerie L. ; Lee, William M.</creator><creatorcontrib>Speiser, Jaime Lynn ; Durkalski, Valerie L. ; Lee, William M.</creatorcontrib><description>Classification of objects into pre‐defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation‐specific information. Copyright © 2014 John Wiley &amp; Sons, Ltd.</description><identifier>ISSN: 0277-6715</identifier><identifier>EISSN: 1097-0258</identifier><identifier>DOI: 10.1002/sim.6351</identifier><identifier>PMID: 25366667</identifier><identifier>CODEN: SMEDDA</identifier><language>eng</language><publisher>England: Blackwell Publishing Ltd</publisher><subject>acute liver failure ; Algorithms ; Biostatistics ; Classification ; Decision Trees ; etiology ; Humans ; Liver Failure, Acute - classification ; Liver Failure, Acute - etiology ; Machine Learning ; Medical statistics ; Models, Statistical ; Normal distribution ; random forest ; Rare Diseases - classification ; Rare Diseases - etiology ; Registries - statistics &amp; numerical data ; statistical classification</subject><ispartof>Statistics in medicine, 2015-02, Vol.34 (5), p.887-899</ispartof><rights>Copyright © 2014 John Wiley &amp; Sons, Ltd.</rights><rights>Copyright Wiley Subscription Services, Inc. Feb 28, 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3</citedby><cites>FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,776,780,881,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25366667$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Speiser, Jaime Lynn</creatorcontrib><creatorcontrib>Durkalski, Valerie L.</creatorcontrib><creatorcontrib>Lee, William M.</creatorcontrib><title>Random forest classification of etiologies for an orphan disease</title><title>Statistics in medicine</title><addtitle>Statist. Med</addtitle><description>Classification of objects into pre‐defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation‐specific information. Copyright © 2014 John Wiley &amp; Sons, Ltd.</description><subject>acute liver failure</subject><subject>Algorithms</subject><subject>Biostatistics</subject><subject>Classification</subject><subject>Decision Trees</subject><subject>etiology</subject><subject>Humans</subject><subject>Liver Failure, Acute - classification</subject><subject>Liver Failure, Acute - etiology</subject><subject>Machine Learning</subject><subject>Medical statistics</subject><subject>Models, Statistical</subject><subject>Normal distribution</subject><subject>random forest</subject><subject>Rare Diseases - classification</subject><subject>Rare Diseases - etiology</subject><subject>Registries - statistics &amp; numerical data</subject><subject>statistical classification</subject><issn>0277-6715</issn><issn>1097-0258</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNp1kd9rFDEQx0NR2rMK_QtkwZe-bM2PnWT3pVgOPQvnCXrgY5hmZ9u0u5trcqf2vzdHz6sK5mXC5MOHmXwZOxH8THAu3yY_nGkF4oBNBG9MySXUz9iES2NKbQQcsRcp3XIuBEhzyI4kKJ2PmbB3X3Bsw1B0IVJaF67HlHznHa59GIvQFZQvfbj2lLZMgbkZVze5tD4RJnrJnnfYJ3q1q8ds-eH9cvqxnH-eXU4v5qWDSoqy0bVAoxBEhy1oRxWgybe26XLPIQep4AqQKqE01LUhJahSTipZc-3UMTt_1K42VwO1jsZ1xN6uoh8wPtiA3v79Mvobex2-20oJbuoqC053ghjuN3lXO_jkqO9xpLBJVmiQlagbAxl98w96GzZxzNtlSnOj6_x7T0IXQ0qRuv0wgtttKjanYrepZPT1n8Pvwd8xZKB8BH74nh7-K7JfLz_thDvepzX93PMY72y2GbDfFjO7aJaLuZ6Cnalf5Cqk7Q</recordid><startdate>20150228</startdate><enddate>20150228</enddate><creator>Speiser, Jaime Lynn</creator><creator>Durkalski, Valerie L.</creator><creator>Lee, William M.</creator><general>Blackwell Publishing Ltd</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>K9.</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20150228</creationdate><title>Random forest classification of etiologies for an orphan disease</title><author>Speiser, Jaime Lynn ; Durkalski, Valerie L. ; Lee, William M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>acute liver failure</topic><topic>Algorithms</topic><topic>Biostatistics</topic><topic>Classification</topic><topic>Decision Trees</topic><topic>etiology</topic><topic>Humans</topic><topic>Liver Failure, Acute - classification</topic><topic>Liver Failure, Acute - etiology</topic><topic>Machine Learning</topic><topic>Medical statistics</topic><topic>Models, Statistical</topic><topic>Normal distribution</topic><topic>random forest</topic><topic>Rare Diseases - classification</topic><topic>Rare Diseases - etiology</topic><topic>Registries - statistics &amp; numerical data</topic><topic>statistical classification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Speiser, Jaime Lynn</creatorcontrib><creatorcontrib>Durkalski, Valerie L.</creatorcontrib><creatorcontrib>Lee, William M.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Statistics in medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Speiser, Jaime Lynn</au><au>Durkalski, Valerie L.</au><au>Lee, William M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Random forest classification of etiologies for an orphan disease</atitle><jtitle>Statistics in medicine</jtitle><addtitle>Statist. Med</addtitle><date>2015-02-28</date><risdate>2015</risdate><volume>34</volume><issue>5</issue><spage>887</spage><epage>899</epage><pages>887-899</pages><issn>0277-6715</issn><eissn>1097-0258</eissn><coden>SMEDDA</coden><abstract>Classification of objects into pre‐defined groups based on known information is a fundamental problem in the field of statistics. Although approaches for solving this problem exist, finding an accurate classification method can be challenging in an orphan disease setting, where data are minimal and often not normally distributed. The purpose of this paper is to illustrate the application of the random forest (RF) classification procedure in a real clinical setting and discuss typical questions that arise in the general classification framework as well as offer interpretations of RF results. This paper includes methods for assessing predictive performance, importance of predictor variables, and observation‐specific information. Copyright © 2014 John Wiley &amp; Sons, Ltd.</abstract><cop>England</cop><pub>Blackwell Publishing Ltd</pub><pmid>25366667</pmid><doi>10.1002/sim.6351</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0277-6715
ispartof Statistics in medicine, 2015-02, Vol.34 (5), p.887-899
issn 0277-6715
1097-0258
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4310784
source Wiley-Blackwell Read & Publish Collection
subjects acute liver failure
Algorithms
Biostatistics
Classification
Decision Trees
etiology
Humans
Liver Failure, Acute - classification
Liver Failure, Acute - etiology
Machine Learning
Medical statistics
Models, Statistical
Normal distribution
random forest
Rare Diseases - classification
Rare Diseases - etiology
Registries - statistics & numerical data
statistical classification
title Random forest classification of etiologies for an orphan disease
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T15%3A38%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Random%20forest%20classification%20of%20etiologies%20for%20an%20orphan%20disease&rft.jtitle=Statistics%20in%20medicine&rft.au=Speiser,%20Jaime%20Lynn&rft.date=2015-02-28&rft.volume=34&rft.issue=5&rft.spage=887&rft.epage=899&rft.pages=887-899&rft.issn=0277-6715&rft.eissn=1097-0258&rft.coden=SMEDDA&rft_id=info:doi/10.1002/sim.6351&rft_dat=%3Cproquest_pubme%3E3615255631%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c5421-9681a73a51fad56ce45a7ad5d9fa51ca05235b5ae41365887e31e43c232806c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1660768366&rft_id=info:pmid/25366667&rfr_iscdi=true