Loading…

Imputation of missing values in multi-view data

Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, espec...

Full description

Saved in:

Bibliographic Details
Published in:	Information fusion 2024-11, Vol.111, p.102524, Article 102524
Main Authors:	van Loon, Wouter, Fokkema, Marjolein, de Vos, Frank, Koini, Marisa, Schmidt, Reinhold, de Rooij, Mark
Format:	Article
Language:	English
Subjects:	Feature selection Imputation Missing data Multi-view learning Stacked generalization
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c231t-2ea128fe920889ea5336401d60c6a33a19dc8864a08d16c0299e8aebf5f853433
container_end_page
container_issue
container_start_page	102524
container_title	Information fusion
container_volume	111
creator	van Loon, Wouter Fokkema, Marjolein de Vos, Frank Koini, Marisa Schmidt, Reinhold de Rooij, Mark
description	Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible. •A new imputation method for multi-view data is introduced.•The new method shows competitive results at a much lower computational cost.•The new method allows state-of-the-art algorithms to be used in much larger data sets than before.
doi_str_mv	10.1016/j.inffus.2024.102524
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_inffus_2024_102524</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1566253524003026</els_id><sourcerecordid>S1566253524003026</sourcerecordid><originalsourceid>FETCH-LOGICAL-c231t-2ea128fe920889ea5336401d60c6a33a19dc8864a08d16c0299e8aebf5f853433</originalsourceid><addsrcrecordid>eNp9j8tKxDAYhYMoOI6-gYu8QDt_rpNuBBm8DAy40XWI6R9J6WVo0hHf3g517eocDpzD-Qi5Z1AyYHrTlLEPYUolBy7niCsuL8iKmS0vtAB1OXuldcGVUNfkJqUGgG1BsBXZ7LvjlF2OQ0-HQLuYUuy_6Mm1EyYae9pNbY7FKeI3rV12t-QquDbh3Z-uycfz0_vutTi8vex3j4fCc8FywdExbgJWHIyp0CkhtARWa_DaCeFYVXtjtHRgaqY98KpC4_AzqGCUkEKsiVx2_TikNGKwxzF2bvyxDOwZ2jZ2gbZnaLtAz7WHpYbzt_n0aJOP2Hus44g-23qI_w_8AiBUYN8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Imputation of missing values in multi-view data</title><source>ScienceDirect Journals</source><creator>van Loon, Wouter ; Fokkema, Marjolein ; de Vos, Frank ; Koini, Marisa ; Schmidt, Reinhold ; de Rooij, Mark</creator><creatorcontrib>van Loon, Wouter ; Fokkema, Marjolein ; de Vos, Frank ; Koini, Marisa ; Schmidt, Reinhold ; de Rooij, Mark</creatorcontrib><description>Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible. •A new imputation method for multi-view data is introduced.•The new method shows competitive results at a much lower computational cost.•The new method allows state-of-the-art algorithms to be used in much larger data sets than before.</description><identifier>ISSN: 1566-2535</identifier><identifier>EISSN: 1872-6305</identifier><identifier>DOI: 10.1016/j.inffus.2024.102524</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Feature selection ; Imputation ; Missing data ; Multi-view learning ; Stacked generalization</subject><ispartof>Information fusion, 2024-11, Vol.111, p.102524, Article 102524</ispartof><rights>2024 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c231t-2ea128fe920889ea5336401d60c6a33a19dc8864a08d16c0299e8aebf5f853433</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>van Loon, Wouter</creatorcontrib><creatorcontrib>Fokkema, Marjolein</creatorcontrib><creatorcontrib>de Vos, Frank</creatorcontrib><creatorcontrib>Koini, Marisa</creatorcontrib><creatorcontrib>Schmidt, Reinhold</creatorcontrib><creatorcontrib>de Rooij, Mark</creatorcontrib><title>Imputation of missing values in multi-view data</title><title>Information fusion</title><description>Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible. •A new imputation method for multi-view data is introduced.•The new method shows competitive results at a much lower computational cost.•The new method allows state-of-the-art algorithms to be used in much larger data sets than before.</description><subject>Feature selection</subject><subject>Imputation</subject><subject>Missing data</subject><subject>Multi-view learning</subject><subject>Stacked generalization</subject><issn>1566-2535</issn><issn>1872-6305</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9j8tKxDAYhYMoOI6-gYu8QDt_rpNuBBm8DAy40XWI6R9J6WVo0hHf3g517eocDpzD-Qi5Z1AyYHrTlLEPYUolBy7niCsuL8iKmS0vtAB1OXuldcGVUNfkJqUGgG1BsBXZ7LvjlF2OQ0-HQLuYUuy_6Mm1EyYae9pNbY7FKeI3rV12t-QquDbh3Z-uycfz0_vutTi8vex3j4fCc8FywdExbgJWHIyp0CkhtARWa_DaCeFYVXtjtHRgaqY98KpC4_AzqGCUkEKsiVx2_TikNGKwxzF2bvyxDOwZ2jZ2gbZnaLtAz7WHpYbzt_n0aJOP2Hus44g-23qI_w_8AiBUYN8</recordid><startdate>202411</startdate><enddate>202411</enddate><creator>van Loon, Wouter</creator><creator>Fokkema, Marjolein</creator><creator>de Vos, Frank</creator><creator>Koini, Marisa</creator><creator>Schmidt, Reinhold</creator><creator>de Rooij, Mark</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202411</creationdate><title>Imputation of missing values in multi-view data</title><author>van Loon, Wouter ; Fokkema, Marjolein ; de Vos, Frank ; Koini, Marisa ; Schmidt, Reinhold ; de Rooij, Mark</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c231t-2ea128fe920889ea5336401d60c6a33a19dc8864a08d16c0299e8aebf5f853433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Feature selection</topic><topic>Imputation</topic><topic>Missing data</topic><topic>Multi-view learning</topic><topic>Stacked generalization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>van Loon, Wouter</creatorcontrib><creatorcontrib>Fokkema, Marjolein</creatorcontrib><creatorcontrib>de Vos, Frank</creatorcontrib><creatorcontrib>Koini, Marisa</creatorcontrib><creatorcontrib>Schmidt, Reinhold</creatorcontrib><creatorcontrib>de Rooij, Mark</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><jtitle>Information fusion</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>van Loon, Wouter</au><au>Fokkema, Marjolein</au><au>de Vos, Frank</au><au>Koini, Marisa</au><au>Schmidt, Reinhold</au><au>de Rooij, Mark</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imputation of missing values in multi-view data</atitle><jtitle>Information fusion</jtitle><date>2024-11</date><risdate>2024</risdate><volume>111</volume><spage>102524</spage><pages>102524-</pages><artnum>102524</artnum><issn>1566-2535</issn><eissn>1872-6305</eissn><abstract>Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible. •A new imputation method for multi-view data is introduced.•The new method shows competitive results at a much lower computational cost.•The new method allows state-of-the-art algorithms to be used in much larger data sets than before.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.inffus.2024.102524</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1566-2535
ispartof	Information fusion, 2024-11, Vol.111, p.102524, Article 102524
issn	1566-2535 1872-6305
language	eng
recordid	cdi_crossref_primary_10_1016_j_inffus_2024_102524
source	ScienceDirect Journals
subjects	Feature selection Imputation Missing data Multi-view learning Stacked generalization
title	Imputation of missing values in multi-view data
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T18%3A53%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imputation%20of%20missing%20values%20in%20multi-view%20data&rft.jtitle=Information%20fusion&rft.au=van%20Loon,%20Wouter&rft.date=2024-11&rft.volume=111&rft.spage=102524&rft.pages=102524-&rft.artnum=102524&rft.issn=1566-2535&rft.eissn=1872-6305&rft_id=info:doi/10.1016/j.inffus.2024.102524&rft_dat=%3Celsevier_cross%3ES1566253524003026%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c231t-2ea128fe920889ea5336401d60c6a33a19dc8864a08d16c0299e8aebf5f853433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true