Loading…

Structured Matrix Completion with Applications to Genomic Data Integration

Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independ...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Statistical Association 2016-06, Vol.111 (514), p.621-633
Main Authors: Cai, Tianxi, Cai, T. Tony, Zhang, Anru
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3
cites cdi_FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3
container_end_page 633
container_issue 514
container_start_page 621
container_title Journal of the American Statistical Association
container_volume 111
creator Cai, Tianxi
Cai, T. Tony
Zhang, Anru
description Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.
doi_str_mv 10.1080/01621459.2015.1021005
format article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_jstor_primary_24739556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>24739556</jstor_id><sourcerecordid>24739556</sourcerecordid><originalsourceid>FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3</originalsourceid><addsrcrecordid>eNqFkUtv1DAUhS0EokPhJxRFYtNNip-1s0FUQylFRSzogp114zitR4kdbIe2_x6HmZbHAryx5PPdo-tzEDog-IhghV9jckwJF80RxUSUJ0owFo_Qiggmayr518dotTD1Au2hZyltcDlSqadojyrMKVFqhT5-yXE2eY62qz5Bju62WodxGmx2wVc3Ll9XJ9M0OAPLQ6pyqM6sD6Mz1TvIUJ37bK_iT_E5etLDkOyL3b2PLt-fXq4_1Befz87XJxe1EYrn2lAme6BWcUY4bo0AZhmFjkjWtAq3shUSS8lA9YwRDD10nWhbaBkmFHq2j95sbae5HW1nrM8RBj1FN0K80wGc_lPx7lpfhe9akEYpzovB4c4ghm-zTVmPLhk7DOBtmJOmJSbSNJTh_6JECV6iLGEW9NVf6CbM0ZcgClXsCBF4MRRbysSQUrT9w94E66VXfd-rXnrVu17L3MvfP_0wdV9kAQ62wCblEH_pvKQqxHHR32515_sQR7gJceh0hrshxD6CNy5p9u8dfgCEC7tz</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1819911500</pqid></control><display><type>article</type><title>Structured Matrix Completion with Applications to Genomic Data Integration</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Taylor and Francis:Jisc Collections:Taylor and Francis Read and Publish Agreement 2024-2025:Science and Technology Collection (Reading list)</source><creator>Cai, Tianxi ; Cai, T. Tony ; Zhang, Anru</creator><creatorcontrib>Cai, Tianxi ; Cai, T. Tony ; Zhang, Anru</creatorcontrib><description>Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.</description><identifier>ISSN: 0162-1459</identifier><identifier>ISSN: 1537-274X</identifier><identifier>EISSN: 1537-274X</identifier><identifier>DOI: 10.1080/01621459.2015.1021005</identifier><identifier>PMID: 28042188</identifier><identifier>CODEN: JSTNAL</identifier><language>eng</language><publisher>United States: Taylor &amp; Francis</publisher><subject>animal ovaries ; Constrained minimization ; engineering ; equations ; Genomic data integration ; Genomics ; Low-rank matrix ; mathematics ; Matrix ; Matrix completion ; Ovarian cancer ; ovarian neoplasms ; prediction ; Sampling ; Simulation ; Singular value decomposition ; Statistics ; Structured matrix completion ; Theory and Methods</subject><ispartof>Journal of the American Statistical Association, 2016-06, Vol.111 (514), p.621-633</ispartof><rights>American Statistical Association 2016</rights><rights>2016 American Statistical Association</rights><rights>Copyright Taylor &amp; Francis Ltd. Jun 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3</citedby><cites>FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/24739556$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/24739556$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,776,780,881,27901,27902,33200,58213,58446</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28042188$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Tianxi</creatorcontrib><creatorcontrib>Cai, T. Tony</creatorcontrib><creatorcontrib>Zhang, Anru</creatorcontrib><title>Structured Matrix Completion with Applications to Genomic Data Integration</title><title>Journal of the American Statistical Association</title><addtitle>J Am Stat Assoc</addtitle><description>Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.</description><subject>animal ovaries</subject><subject>Constrained minimization</subject><subject>engineering</subject><subject>equations</subject><subject>Genomic data integration</subject><subject>Genomics</subject><subject>Low-rank matrix</subject><subject>mathematics</subject><subject>Matrix</subject><subject>Matrix completion</subject><subject>Ovarian cancer</subject><subject>ovarian neoplasms</subject><subject>prediction</subject><subject>Sampling</subject><subject>Simulation</subject><subject>Singular value decomposition</subject><subject>Statistics</subject><subject>Structured matrix completion</subject><subject>Theory and Methods</subject><issn>0162-1459</issn><issn>1537-274X</issn><issn>1537-274X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><recordid>eNqFkUtv1DAUhS0EokPhJxRFYtNNip-1s0FUQylFRSzogp114zitR4kdbIe2_x6HmZbHAryx5PPdo-tzEDog-IhghV9jckwJF80RxUSUJ0owFo_Qiggmayr518dotTD1Au2hZyltcDlSqadojyrMKVFqhT5-yXE2eY62qz5Bju62WodxGmx2wVc3Ll9XJ9M0OAPLQ6pyqM6sD6Mz1TvIUJ37bK_iT_E5etLDkOyL3b2PLt-fXq4_1Befz87XJxe1EYrn2lAme6BWcUY4bo0AZhmFjkjWtAq3shUSS8lA9YwRDD10nWhbaBkmFHq2j95sbae5HW1nrM8RBj1FN0K80wGc_lPx7lpfhe9akEYpzovB4c4ghm-zTVmPLhk7DOBtmJOmJSbSNJTh_6JECV6iLGEW9NVf6CbM0ZcgClXsCBF4MRRbysSQUrT9w94E66VXfd-rXnrVu17L3MvfP_0wdV9kAQ62wCblEH_pvKQqxHHR32515_sQR7gJceh0hrshxD6CNy5p9u8dfgCEC7tz</recordid><startdate>20160601</startdate><enddate>20160601</enddate><creator>Cai, Tianxi</creator><creator>Cai, T. Tony</creator><creator>Zhang, Anru</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Group, LLC</general><general>Taylor &amp; Francis Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8BJ</scope><scope>FQK</scope><scope>JBE</scope><scope>K9.</scope><scope>7X8</scope><scope>7S9</scope><scope>L.6</scope><scope>5PM</scope></search><sort><creationdate>20160601</creationdate><title>Structured Matrix Completion with Applications to Genomic Data Integration</title><author>Cai, Tianxi ; Cai, T. Tony ; Zhang, Anru</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>animal ovaries</topic><topic>Constrained minimization</topic><topic>engineering</topic><topic>equations</topic><topic>Genomic data integration</topic><topic>Genomics</topic><topic>Low-rank matrix</topic><topic>mathematics</topic><topic>Matrix</topic><topic>Matrix completion</topic><topic>Ovarian cancer</topic><topic>ovarian neoplasms</topic><topic>prediction</topic><topic>Sampling</topic><topic>Simulation</topic><topic>Singular value decomposition</topic><topic>Statistics</topic><topic>Structured matrix completion</topic><topic>Theory and Methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cai, Tianxi</creatorcontrib><creatorcontrib>Cai, T. Tony</creatorcontrib><creatorcontrib>Zhang, Anru</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of the American Statistical Association</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cai, Tianxi</au><au>Cai, T. Tony</au><au>Zhang, Anru</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structured Matrix Completion with Applications to Genomic Data Integration</atitle><jtitle>Journal of the American Statistical Association</jtitle><addtitle>J Am Stat Assoc</addtitle><date>2016-06-01</date><risdate>2016</risdate><volume>111</volume><issue>514</issue><spage>621</spage><epage>633</epage><pages>621-633</pages><issn>0162-1459</issn><issn>1537-274X</issn><eissn>1537-274X</eissn><coden>JSTNAL</coden><abstract>Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.</abstract><cop>United States</cop><pub>Taylor &amp; Francis</pub><pmid>28042188</pmid><doi>10.1080/01621459.2015.1021005</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0162-1459
ispartof Journal of the American Statistical Association, 2016-06, Vol.111 (514), p.621-633
issn 0162-1459
1537-274X
1537-274X
language eng
recordid cdi_jstor_primary_24739556
source International Bibliography of the Social Sciences (IBSS); JSTOR Archival Journals and Primary Sources Collection; Taylor and Francis:Jisc Collections:Taylor and Francis Read and Publish Agreement 2024-2025:Science and Technology Collection (Reading list)
subjects animal ovaries
Constrained minimization
engineering
equations
Genomic data integration
Genomics
Low-rank matrix
mathematics
Matrix
Matrix completion
Ovarian cancer
ovarian neoplasms
prediction
Sampling
Simulation
Singular value decomposition
Statistics
Structured matrix completion
Theory and Methods
title Structured Matrix Completion with Applications to Genomic Data Integration
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T16%3A48%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structured%20Matrix%20Completion%20with%20Applications%20to%20Genomic%20Data%20Integration&rft.jtitle=Journal%20of%20the%20American%20Statistical%20Association&rft.au=Cai,%20Tianxi&rft.date=2016-06-01&rft.volume=111&rft.issue=514&rft.spage=621&rft.epage=633&rft.pages=621-633&rft.issn=0162-1459&rft.eissn=1537-274X&rft.coden=JSTNAL&rft_id=info:doi/10.1080/01621459.2015.1021005&rft_dat=%3Cjstor_pubme%3E24739556%3C/jstor_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c584t-c237fa2e843140bc5a3e32ad1739b80b7b570773a8f3310afadd5bbab3012af3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1819911500&rft_id=info:pmid/28042188&rft_jstor_id=24739556&rfr_iscdi=true