Loading…

Sparse linear discriminant analysis for multiview structured data

Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two‐step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SI...

Full description

Saved in:
Bibliographic Details
Published in:Biometrics 2022-06, Vol.78 (2), p.612-623
Main Authors: Safo, Sandra E., Min, Eun Jeong, Haine, Lillian
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653
cites cdi_FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653
container_end_page 623
container_issue 2
container_start_page 612
container_title Biometrics
container_volume 78
creator Safo, Sandra E.
Min, Eun Jeong
Haine, Lillian
description Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two‐step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.
doi_str_mv 10.1111/biom.13458
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8906173</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2503434187</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653</originalsourceid><addsrcrecordid>eNp9kUtLw0AUhQdRbK1u_AEScCNC6ryTbIRafBQqXajgbpgkE52SZOpM0tJ_79TUoi68m-EyH-eeew8ApwgOka-rVJtqiAhl8R7oI0ZRCCmG-6APIeQhoei1B46cm_s2YRAfgh4hEUkojftg9LSQ1qmg1LWSNsi1y6yudC3rJpC1LNdOu6AwNqjastFLrVaBa2ybNa1VeZDLRh6Dg0KWTp1s3wF4ubt9Hj-E09n9ZDyahpkfFIcJZSTzU3Ga4YIjyWhMCsQ4jVicqyKnOOJYZSRBkCeQ4BQnRZrKlBVEJZgzMgDXne6iTSuVZ6purCzFwtuVdi2M1OL3T63fxZtZijiBHEXEC1xsBaz5aJVrROW3VWUpa2VaJzCDhPprxZFHz_-gc9Nafw5P8ZhyxCNKPXXZUZk1zllV7MwgKDbJiE0y4isZD5_9tL9Dv6PwAOqAlS7V-h8pcTOZPXain-mgmR4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2684616744</pqid></control><display><type>article</type><title>Sparse linear discriminant analysis for multiview structured data</title><source>Oxford Journals Online</source><source>SPORTDiscus with Full Text</source><creator>Safo, Sandra E. ; Min, Eun Jeong ; Haine, Lillian</creator><creatorcontrib>Safo, Sandra E. ; Min, Eun Jeong ; Haine, Lillian</creatorcontrib><description>Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two‐step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.</description><identifier>ISSN: 0006-341X</identifier><identifier>EISSN: 1541-0420</identifier><identifier>DOI: 10.1111/biom.13458</identifier><identifier>PMID: 33739448</identifier><language>eng</language><publisher>England: Blackwell Publishing Ltd</publisher><subject>Arteriosclerosis ; Atherosclerosis ; canonical correlation analysis ; Cardiovascular diseases ; Classification ; Correlation analysis ; Discriminant Analysis ; Humans ; integrative analysis ; joint association and classification ; Laplacian ; multiple sources of data ; pathway analysis ; Risk analysis ; Risk factors ; sparsity ; Structured data</subject><ispartof>Biometrics, 2022-06, Vol.78 (2), p.612-623</ispartof><rights>2021 The International Biometric Society.</rights><rights>2022 The International Biometric Society.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653</citedby><cites>FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653</cites><orcidid>0000-0001-9593-4778</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33739448$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Safo, Sandra E.</creatorcontrib><creatorcontrib>Min, Eun Jeong</creatorcontrib><creatorcontrib>Haine, Lillian</creatorcontrib><title>Sparse linear discriminant analysis for multiview structured data</title><title>Biometrics</title><addtitle>Biometrics</addtitle><description>Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two‐step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.</description><subject>Arteriosclerosis</subject><subject>Atherosclerosis</subject><subject>canonical correlation analysis</subject><subject>Cardiovascular diseases</subject><subject>Classification</subject><subject>Correlation analysis</subject><subject>Discriminant Analysis</subject><subject>Humans</subject><subject>integrative analysis</subject><subject>joint association and classification</subject><subject>Laplacian</subject><subject>multiple sources of data</subject><subject>pathway analysis</subject><subject>Risk analysis</subject><subject>Risk factors</subject><subject>sparsity</subject><subject>Structured data</subject><issn>0006-341X</issn><issn>1541-0420</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kUtLw0AUhQdRbK1u_AEScCNC6ryTbIRafBQqXajgbpgkE52SZOpM0tJ_79TUoi68m-EyH-eeew8ApwgOka-rVJtqiAhl8R7oI0ZRCCmG-6APIeQhoei1B46cm_s2YRAfgh4hEUkojftg9LSQ1qmg1LWSNsi1y6yudC3rJpC1LNdOu6AwNqjastFLrVaBa2ybNa1VeZDLRh6Dg0KWTp1s3wF4ubt9Hj-E09n9ZDyahpkfFIcJZSTzU3Ga4YIjyWhMCsQ4jVicqyKnOOJYZSRBkCeQ4BQnRZrKlBVEJZgzMgDXne6iTSuVZ6purCzFwtuVdi2M1OL3T63fxZtZijiBHEXEC1xsBaz5aJVrROW3VWUpa2VaJzCDhPprxZFHz_-gc9Nafw5P8ZhyxCNKPXXZUZk1zllV7MwgKDbJiE0y4isZD5_9tL9Dv6PwAOqAlS7V-h8pcTOZPXain-mgmR4</recordid><startdate>202206</startdate><enddate>202206</enddate><creator>Safo, Sandra E.</creator><creator>Min, Eun Jeong</creator><creator>Haine, Lillian</creator><general>Blackwell Publishing Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-9593-4778</orcidid></search><sort><creationdate>202206</creationdate><title>Sparse linear discriminant analysis for multiview structured data</title><author>Safo, Sandra E. ; Min, Eun Jeong ; Haine, Lillian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Arteriosclerosis</topic><topic>Atherosclerosis</topic><topic>canonical correlation analysis</topic><topic>Cardiovascular diseases</topic><topic>Classification</topic><topic>Correlation analysis</topic><topic>Discriminant Analysis</topic><topic>Humans</topic><topic>integrative analysis</topic><topic>joint association and classification</topic><topic>Laplacian</topic><topic>multiple sources of data</topic><topic>pathway analysis</topic><topic>Risk analysis</topic><topic>Risk factors</topic><topic>sparsity</topic><topic>Structured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Safo, Sandra E.</creatorcontrib><creatorcontrib>Min, Eun Jeong</creatorcontrib><creatorcontrib>Haine, Lillian</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Biometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Safo, Sandra E.</au><au>Min, Eun Jeong</au><au>Haine, Lillian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sparse linear discriminant analysis for multiview structured data</atitle><jtitle>Biometrics</jtitle><addtitle>Biometrics</addtitle><date>2022-06</date><risdate>2022</risdate><volume>78</volume><issue>2</issue><spage>612</spage><epage>623</epage><pages>612-623</pages><issn>0006-341X</issn><eissn>1541-0420</eissn><abstract>Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two‐step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.</abstract><cop>England</cop><pub>Blackwell Publishing Ltd</pub><pmid>33739448</pmid><doi>10.1111/biom.13458</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-9593-4778</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0006-341X
ispartof Biometrics, 2022-06, Vol.78 (2), p.612-623
issn 0006-341X
1541-0420
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8906173
source Oxford Journals Online; SPORTDiscus with Full Text
subjects Arteriosclerosis
Atherosclerosis
canonical correlation analysis
Cardiovascular diseases
Classification
Correlation analysis
Discriminant Analysis
Humans
integrative analysis
joint association and classification
Laplacian
multiple sources of data
pathway analysis
Risk analysis
Risk factors
sparsity
Structured data
title Sparse linear discriminant analysis for multiview structured data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T00%3A42%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sparse%20linear%20discriminant%20analysis%20for%20multiview%20structured%20data&rft.jtitle=Biometrics&rft.au=Safo,%20Sandra%20E.&rft.date=2022-06&rft.volume=78&rft.issue=2&rft.spage=612&rft.epage=623&rft.pages=612-623&rft.issn=0006-341X&rft.eissn=1541-0420&rft_id=info:doi/10.1111/biom.13458&rft_dat=%3Cproquest_pubme%3E2503434187%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4488-9453c7392bc2f61a5483f1564758defd42762ec391069032b29fbbab5f3e92653%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2684616744&rft_id=info:pmid/33739448&rfr_iscdi=true