Loading…

Learning vector quantized representation for cancer subtypes identification

•Problem Defining and separating cancer subtypes is essential for facilitating personalized therapy and the prognosis of patients.•What is already known: Existing studies suffer from issues associated with omics data: sample scarcity and high dimensionality.•What this paper adds: A generative vector...

Full description

Saved in:
Bibliographic Details
Published in:Computer methods and programs in biomedicine 2023-06, Vol.236, p.107543-107543, Article 107543
Main Authors: Chen, Zheng, Yang, Ziwei, Zhu, Lingwei, Gao, Peng, Matsubara, Takashi, Kanaya, Shigehiko, Altaf-Ul-Amin, Md
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3
cites cdi_FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3
container_end_page 107543
container_issue
container_start_page 107543
container_title Computer methods and programs in biomedicine
container_volume 236
creator Chen, Zheng
Yang, Ziwei
Zhu, Lingwei
Gao, Peng
Matsubara, Takashi
Kanaya, Shigehiko
Altaf-Ul-Amin, Md
description •Problem Defining and separating cancer subtypes is essential for facilitating personalized therapy and the prognosis of patients.•What is already known: Existing studies suffer from issues associated with omics data: sample scarcity and high dimensionality.•What this paper adds: A generative vector quantized-variational autoEncode with categorical latent is a new solution. Background and objective:Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. Methods: This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. Results:Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. Conclusion:Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.
doi_str_mv 10.1016/j.cmpb.2023.107543
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2806996105</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0169260723002080</els_id><sourcerecordid>2806996105</sourcerecordid><originalsourceid>FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EoqXwAyxQlmxS_IidRGKDEC9RiQ2sLdsZI1fNo3ZSqXw9Diks2cxoPGeu5IPQJcFLgom4WS9N3eklxZTFh5xn7AjNSZHTNOeCH6N5hMqUCpzP0FkIa4wx5VycohnLyThkc_S6AuUb13wmOzB965PtoJrefUGVeOg8BGh61bu2SWxcGtUY8EkYdL_vICSuimtnnflBztGJVZsAF4e-QB-PD-_3z-nq7enl_m6VmgzjPi1smWGFNdcgMqEqFWtpSKGL2K1g1mTaMqZLXXGudcFUIWguBKeYE4GBLdD1lNv5djtA6GXtgoHNRjXQDkHSAouyFATziNIJNb4NwYOVnXe18ntJsBwlyrUcJcpRopwkxqOrQ_6ga6j-Tn6tReB2AiD-cufAy2AcRDWV89GirFr3X_43w4mDRw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2806996105</pqid></control><display><type>article</type><title>Learning vector quantized representation for cancer subtypes identification</title><source>ScienceDirect Journals</source><creator>Chen, Zheng ; Yang, Ziwei ; Zhu, Lingwei ; Gao, Peng ; Matsubara, Takashi ; Kanaya, Shigehiko ; Altaf-Ul-Amin, Md</creator><creatorcontrib>Chen, Zheng ; Yang, Ziwei ; Zhu, Lingwei ; Gao, Peng ; Matsubara, Takashi ; Kanaya, Shigehiko ; Altaf-Ul-Amin, Md</creatorcontrib><description>•Problem Defining and separating cancer subtypes is essential for facilitating personalized therapy and the prognosis of patients.•What is already known: Existing studies suffer from issues associated with omics data: sample scarcity and high dimensionality.•What this paper adds: A generative vector quantized-variational autoEncode with categorical latent is a new solution. Background and objective:Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. Methods: This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. Results:Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. Conclusion:Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.</description><identifier>ISSN: 0169-2607</identifier><identifier>EISSN: 1872-7565</identifier><identifier>DOI: 10.1016/j.cmpb.2023.107543</identifier><identifier>PMID: 37100024</identifier><language>eng</language><publisher>Ireland: Elsevier B.V</publisher><subject>Cancer subtyping ; Cluster Analysis ; Clustering ; Deep generative models ; Gene Expression Profiling ; Humans ; Neoplasms ; Transcriptome ; Vector quantization</subject><ispartof>Computer methods and programs in biomedicine, 2023-06, Vol.236, p.107543-107543, Article 107543</ispartof><rights>2023 Elsevier B.V.</rights><rights>Copyright © 2023 Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3</citedby><cites>FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3</cites><orcidid>0000-0001-6776-7159 ; 0000-0002-9514-6760 ; 0000-0003-2913-3432 ; 0000-0003-0642-4800 ; 0000-0001-9846-840X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37100024$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Zheng</creatorcontrib><creatorcontrib>Yang, Ziwei</creatorcontrib><creatorcontrib>Zhu, Lingwei</creatorcontrib><creatorcontrib>Gao, Peng</creatorcontrib><creatorcontrib>Matsubara, Takashi</creatorcontrib><creatorcontrib>Kanaya, Shigehiko</creatorcontrib><creatorcontrib>Altaf-Ul-Amin, Md</creatorcontrib><title>Learning vector quantized representation for cancer subtypes identification</title><title>Computer methods and programs in biomedicine</title><addtitle>Comput Methods Programs Biomed</addtitle><description>•Problem Defining and separating cancer subtypes is essential for facilitating personalized therapy and the prognosis of patients.•What is already known: Existing studies suffer from issues associated with omics data: sample scarcity and high dimensionality.•What this paper adds: A generative vector quantized-variational autoEncode with categorical latent is a new solution. Background and objective:Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. Methods: This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. Results:Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. Conclusion:Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.</description><subject>Cancer subtyping</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Deep generative models</subject><subject>Gene Expression Profiling</subject><subject>Humans</subject><subject>Neoplasms</subject><subject>Transcriptome</subject><subject>Vector quantization</subject><issn>0169-2607</issn><issn>1872-7565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EoqXwAyxQlmxS_IidRGKDEC9RiQ2sLdsZI1fNo3ZSqXw9Diks2cxoPGeu5IPQJcFLgom4WS9N3eklxZTFh5xn7AjNSZHTNOeCH6N5hMqUCpzP0FkIa4wx5VycohnLyThkc_S6AuUb13wmOzB965PtoJrefUGVeOg8BGh61bu2SWxcGtUY8EkYdL_vICSuimtnnflBztGJVZsAF4e-QB-PD-_3z-nq7enl_m6VmgzjPi1smWGFNdcgMqEqFWtpSKGL2K1g1mTaMqZLXXGudcFUIWguBKeYE4GBLdD1lNv5djtA6GXtgoHNRjXQDkHSAouyFATziNIJNb4NwYOVnXe18ntJsBwlyrUcJcpRopwkxqOrQ_6ga6j-Tn6tReB2AiD-cufAy2AcRDWV89GirFr3X_43w4mDRw</recordid><startdate>202306</startdate><enddate>202306</enddate><creator>Chen, Zheng</creator><creator>Yang, Ziwei</creator><creator>Zhu, Lingwei</creator><creator>Gao, Peng</creator><creator>Matsubara, Takashi</creator><creator>Kanaya, Shigehiko</creator><creator>Altaf-Ul-Amin, Md</creator><general>Elsevier B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6776-7159</orcidid><orcidid>https://orcid.org/0000-0002-9514-6760</orcidid><orcidid>https://orcid.org/0000-0003-2913-3432</orcidid><orcidid>https://orcid.org/0000-0003-0642-4800</orcidid><orcidid>https://orcid.org/0000-0001-9846-840X</orcidid></search><sort><creationdate>202306</creationdate><title>Learning vector quantized representation for cancer subtypes identification</title><author>Chen, Zheng ; Yang, Ziwei ; Zhu, Lingwei ; Gao, Peng ; Matsubara, Takashi ; Kanaya, Shigehiko ; Altaf-Ul-Amin, Md</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cancer subtyping</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Deep generative models</topic><topic>Gene Expression Profiling</topic><topic>Humans</topic><topic>Neoplasms</topic><topic>Transcriptome</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Zheng</creatorcontrib><creatorcontrib>Yang, Ziwei</creatorcontrib><creatorcontrib>Zhu, Lingwei</creatorcontrib><creatorcontrib>Gao, Peng</creatorcontrib><creatorcontrib>Matsubara, Takashi</creatorcontrib><creatorcontrib>Kanaya, Shigehiko</creatorcontrib><creatorcontrib>Altaf-Ul-Amin, Md</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Computer methods and programs in biomedicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Zheng</au><au>Yang, Ziwei</au><au>Zhu, Lingwei</au><au>Gao, Peng</au><au>Matsubara, Takashi</au><au>Kanaya, Shigehiko</au><au>Altaf-Ul-Amin, Md</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning vector quantized representation for cancer subtypes identification</atitle><jtitle>Computer methods and programs in biomedicine</jtitle><addtitle>Comput Methods Programs Biomed</addtitle><date>2023-06</date><risdate>2023</risdate><volume>236</volume><spage>107543</spage><epage>107543</epage><pages>107543-107543</pages><artnum>107543</artnum><issn>0169-2607</issn><eissn>1872-7565</eissn><abstract>•Problem Defining and separating cancer subtypes is essential for facilitating personalized therapy and the prognosis of patients.•What is already known: Existing studies suffer from issues associated with omics data: sample scarcity and high dimensionality.•What this paper adds: A generative vector quantized-variational autoEncode with categorical latent is a new solution. Background and objective:Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. Methods: This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. Results:Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. Conclusion:Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.</abstract><cop>Ireland</cop><pub>Elsevier B.V</pub><pmid>37100024</pmid><doi>10.1016/j.cmpb.2023.107543</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-6776-7159</orcidid><orcidid>https://orcid.org/0000-0002-9514-6760</orcidid><orcidid>https://orcid.org/0000-0003-2913-3432</orcidid><orcidid>https://orcid.org/0000-0003-0642-4800</orcidid><orcidid>https://orcid.org/0000-0001-9846-840X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0169-2607
ispartof Computer methods and programs in biomedicine, 2023-06, Vol.236, p.107543-107543, Article 107543
issn 0169-2607
1872-7565
language eng
recordid cdi_proquest_miscellaneous_2806996105
source ScienceDirect Journals
subjects Cancer subtyping
Cluster Analysis
Clustering
Deep generative models
Gene Expression Profiling
Humans
Neoplasms
Transcriptome
Vector quantization
title Learning vector quantized representation for cancer subtypes identification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T01%3A44%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20vector%20quantized%20representation%20for%20cancer%20subtypes%20identification&rft.jtitle=Computer%20methods%20and%20programs%20in%20biomedicine&rft.au=Chen,%20Zheng&rft.date=2023-06&rft.volume=236&rft.spage=107543&rft.epage=107543&rft.pages=107543-107543&rft.artnum=107543&rft.issn=0169-2607&rft.eissn=1872-7565&rft_id=info:doi/10.1016/j.cmpb.2023.107543&rft_dat=%3Cproquest_cross%3E2806996105%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c400t-8f940a0b5be646ada6469c18b8469f63fc4bf33b9bd55bb83a8627665205160e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2806996105&rft_id=info:pmid/37100024&rfr_iscdi=true