Loading…

Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)

Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2020-05
Main Authors: Vivar, Gerome, Kazi, Anees, Burwinkel, Hendrik, Zwergal, Andreas, Navab, Nassir, Seyed-Ahmad, Ahmadi
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Vivar, Gerome
Kazi, Anees
Burwinkel, Hendrik
Zwergal, Andreas
Navab, Nassir
Seyed-Ahmad, Ahmadi
description Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume full data availability, which is not always feasible in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multigraph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.
doi_str_mv 10.48550/arxiv.2005.06935
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2403203399</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403203399</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-aca7c0d0885d6ad8a22d55513358288f8b223173033e14d10a13b38b444eb3373</originalsourceid><addsrcrecordid>eNotjU9LAzEUxIMgWGo_gLeAFz1sTd5LutmjLFqFLh7svbzdpDVl_7nJSr-A39stFQbmMDO_YexOiqUyWosnGk7-ZwlC6KVYZaiv2AwQZWIUwA1bhHAUQsAqBa1xxn4_fTPWkVrXjYH7ph8jRd-1nFrLrQ-OguNVTSH4va8ukT-r6pq-dtHxxtkpqLmlOHVj4GPw7YEXE9UfBuq_-Np1jYuDr3hBk514ftmeWQ_Fusgfb9n1nurgFv8-Z9vXl23-lmw-1u_58yYhDVlCFaWVsMIYbVdkDQFYrbVE1AaM2ZsSAGWKAtFJZaUgiSWaUinlSsQU5-z-gu2H7nt0Ie6O3Ti00-MOlECYhlmGf083Y9M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2403203399</pqid></control><display><type>article</type><title>Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Vivar, Gerome ; Kazi, Anees ; Burwinkel, Hendrik ; Zwergal, Andreas ; Navab, Nassir ; Seyed-Ahmad, Ahmadi</creator><creatorcontrib>Vivar, Gerome ; Kazi, Anees ; Burwinkel, Hendrik ; Zwergal, Andreas ; Navab, Nassir ; Seyed-Ahmad, Ahmadi</creatorcontrib><description>Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume full data availability, which is not always feasible in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multigraph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2005.06935</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Computer aided decision processes ; Datasets ; Decision support systems ; Diagnosis ; Disease ; Machine learning ; Missing data</subject><ispartof>arXiv.org, 2020-05</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2403203399?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,27902,36989,44566</link.rule.ids></links><search><creatorcontrib>Vivar, Gerome</creatorcontrib><creatorcontrib>Kazi, Anees</creatorcontrib><creatorcontrib>Burwinkel, Hendrik</creatorcontrib><creatorcontrib>Zwergal, Andreas</creatorcontrib><creatorcontrib>Navab, Nassir</creatorcontrib><creatorcontrib>Seyed-Ahmad, Ahmadi</creatorcontrib><title>Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)</title><title>arXiv.org</title><description>Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume full data availability, which is not always feasible in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multigraph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.</description><subject>Classification</subject><subject>Computer aided decision processes</subject><subject>Datasets</subject><subject>Decision support systems</subject><subject>Diagnosis</subject><subject>Disease</subject><subject>Machine learning</subject><subject>Missing data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotjU9LAzEUxIMgWGo_gLeAFz1sTd5LutmjLFqFLh7svbzdpDVl_7nJSr-A39stFQbmMDO_YexOiqUyWosnGk7-ZwlC6KVYZaiv2AwQZWIUwA1bhHAUQsAqBa1xxn4_fTPWkVrXjYH7ph8jRd-1nFrLrQ-OguNVTSH4va8ukT-r6pq-dtHxxtkpqLmlOHVj4GPw7YEXE9UfBuq_-Np1jYuDr3hBk514ftmeWQ_Fusgfb9n1nurgFv8-Z9vXl23-lmw-1u_58yYhDVlCFaWVsMIYbVdkDQFYrbVE1AaM2ZsSAGWKAtFJZaUgiSWaUinlSsQU5-z-gu2H7nt0Ie6O3Ti00-MOlECYhlmGf083Y9M</recordid><startdate>20200514</startdate><enddate>20200514</enddate><creator>Vivar, Gerome</creator><creator>Kazi, Anees</creator><creator>Burwinkel, Hendrik</creator><creator>Zwergal, Andreas</creator><creator>Navab, Nassir</creator><creator>Seyed-Ahmad, Ahmadi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200514</creationdate><title>Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)</title><author>Vivar, Gerome ; Kazi, Anees ; Burwinkel, Hendrik ; Zwergal, Andreas ; Navab, Nassir ; Seyed-Ahmad, Ahmadi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-aca7c0d0885d6ad8a22d55513358288f8b223173033e14d10a13b38b444eb3373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Classification</topic><topic>Computer aided decision processes</topic><topic>Datasets</topic><topic>Decision support systems</topic><topic>Diagnosis</topic><topic>Disease</topic><topic>Machine learning</topic><topic>Missing data</topic><toplevel>online_resources</toplevel><creatorcontrib>Vivar, Gerome</creatorcontrib><creatorcontrib>Kazi, Anees</creatorcontrib><creatorcontrib>Burwinkel, Hendrik</creatorcontrib><creatorcontrib>Zwergal, Andreas</creatorcontrib><creatorcontrib>Navab, Nassir</creatorcontrib><creatorcontrib>Seyed-Ahmad, Ahmadi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vivar, Gerome</au><au>Kazi, Anees</au><au>Burwinkel, Hendrik</au><au>Zwergal, Andreas</au><au>Navab, Nassir</au><au>Seyed-Ahmad, Ahmadi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)</atitle><jtitle>arXiv.org</jtitle><date>2020-05-14</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular Computer Aided Diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume full data availability, which is not always feasible in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multigraph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multigraph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2005.06935</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2020-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_2403203399
source Publicly Available Content Database (Proquest) (PQ_SDU_P3)
subjects Classification
Computer aided decision processes
Datasets
Decision support systems
Diagnosis
Disease
Machine learning
Missing data
title Simultaneous imputation and disease classification in incomplete medical datasets using Multigraph Geometric Matrix Completion (MGMC)
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T09%3A32%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Simultaneous%20imputation%20and%20disease%20classification%20in%20incomplete%20medical%20datasets%20using%20Multigraph%20Geometric%20Matrix%20Completion%20(MGMC)&rft.jtitle=arXiv.org&rft.au=Vivar,%20Gerome&rft.date=2020-05-14&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2005.06935&rft_dat=%3Cproquest%3E2403203399%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a529-aca7c0d0885d6ad8a22d55513358288f8b223173033e14d10a13b38b444eb3373%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2403203399&rft_id=info:pmid/&rfr_iscdi=true