Loading…

A three-dimensional feature-based fusion strategy for infrared and visible image fusion

Due to the lacking of attention to the scene’s essential characteristics, the existing fusion methods suffer from the deficiency of scene distortion. In addition, the lack of groundtruth can cause an inadequate representation of vital information. To this end, we propose a novel infrared and visible...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition 2025-01, Vol.157, p.110885, Article 110885
Main Authors:	Liu, Xiaowen, Huo, Hongtao, Yang, Xin, Li, Jing
Format:	Article
Language:	English
Subjects:	Contrastive learning Convolution neural network Image fusion
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c185t-e647fb998c4224615395bb7a40139b7e8db5e9d5f6a4c04662e78c87085aba9c3
container_end_page
container_issue
container_start_page	110885
container_title	Pattern recognition
container_volume	157
creator	Liu, Xiaowen Huo, Hongtao Yang, Xin Li, Jing
description	Due to the lacking of attention to the scene’s essential characteristics, the existing fusion methods suffer from the deficiency of scene distortion. In addition, the lack of groundtruth can cause an inadequate representation of vital information. To this end, we propose a novel infrared and visible image fusion network based on three-dimensional feature fusion strategy (D3Fuse). In our method, we consider the scene semantic information in the source images and extract the commonality contents of the two images as the third-dimensional feature to extend the feature space for fusion tasks. Specifically, a commonality feature extraction module (CFEM) is designed to extract the scene commonality features. Subsequently, the scene commonality features are utilized together with modality features to construct the fusion image. Moreover, to ensure the independence and diversity of distinct features, we employ a contrastive learning strategy with multiscale PCA coding, which stretches the feature distance in an unsupervised manner, prompting the encoder to extract more discriminative information without incurring additional parameters and computational costs. Furthermore, a contrastive enhancement strategy is utilized to ensure adequate representation of modality information. The results of the qualitative and quantitative evaluations on the three datasets show that the proposed method has better visual performance and higher objective metrics with lower computational cost. The object detection experiments show that our results have superior performance on high-level semantic tasks. •A three-dimensional feature fusion strategy is proposed to extend the feature space.•We inject scene commonality features into fusion results to enhance visibility.•A contrast learning strategy is utilized to optimize the feature encoding.•A contrast enhancement strategy is adopted for modality information retention.•The experiments demonstrate performance on fusion and advanced vision tasks.
doi_str_mv	10.1016/j.patcog.2024.110885
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_patcog_2024_110885</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0031320324006368</els_id><sourcerecordid>S0031320324006368</sourcerecordid><originalsourceid>FETCH-LOGICAL-c185t-e647fb998c4224615395bb7a40139b7e8db5e9d5f6a4c04662e78c87085aba9c3</originalsourceid><addsrcrecordid>eNp9kMtqwzAURLVooWnaP-hCP2BXkiVb3hRC6AsC3bR0KfS4ShUcO0hKIH9fG2ed1cCdO8NwEHqipKSE1s-78qCzHbYlI4yXlBIpxQ1aEFLRomKkukP3Ke0IoQ3lbIF-Vzj_RYDChT30KQy97rAHnY8RCqMTOOyP0xmnHHWG7Rn7IeLQ-6jjaOre4VNIwXSAw15v4fL-gG697hI8XnSJft5ev9cfxebr_XO92hSWSpELqHnjTdtKyxnjNRVVK4xpNCe0ak0D0hkBrRO-1twSXtcMGmllQ6TQRre2WiI-99o4pBTBq0Mcd8SzokRNQNROzUDUBETNQMbYyxyDcdspQFTJBugtuBDBZuWGcL3gH7RNblQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A three-dimensional feature-based fusion strategy for infrared and visible image fusion</title><source>ScienceDirect Journals</source><creator>Liu, Xiaowen ; Huo, Hongtao ; Yang, Xin ; Li, Jing</creator><creatorcontrib>Liu, Xiaowen ; Huo, Hongtao ; Yang, Xin ; Li, Jing</creatorcontrib><description>Due to the lacking of attention to the scene’s essential characteristics, the existing fusion methods suffer from the deficiency of scene distortion. In addition, the lack of groundtruth can cause an inadequate representation of vital information. To this end, we propose a novel infrared and visible image fusion network based on three-dimensional feature fusion strategy (D3Fuse). In our method, we consider the scene semantic information in the source images and extract the commonality contents of the two images as the third-dimensional feature to extend the feature space for fusion tasks. Specifically, a commonality feature extraction module (CFEM) is designed to extract the scene commonality features. Subsequently, the scene commonality features are utilized together with modality features to construct the fusion image. Moreover, to ensure the independence and diversity of distinct features, we employ a contrastive learning strategy with multiscale PCA coding, which stretches the feature distance in an unsupervised manner, prompting the encoder to extract more discriminative information without incurring additional parameters and computational costs. Furthermore, a contrastive enhancement strategy is utilized to ensure adequate representation of modality information. The results of the qualitative and quantitative evaluations on the three datasets show that the proposed method has better visual performance and higher objective metrics with lower computational cost. The object detection experiments show that our results have superior performance on high-level semantic tasks. •A three-dimensional feature fusion strategy is proposed to extend the feature space.•We inject scene commonality features into fusion results to enhance visibility.•A contrast learning strategy is utilized to optimize the feature encoding.•A contrast enhancement strategy is adopted for modality information retention.•The experiments demonstrate performance on fusion and advanced vision tasks.</description><identifier>ISSN: 0031-3203</identifier><identifier>DOI: 10.1016/j.patcog.2024.110885</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Contrastive learning ; Convolution neural network ; Image fusion</subject><ispartof>Pattern recognition, 2025-01, Vol.157, p.110885, Article 110885</ispartof><rights>2024 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c185t-e647fb998c4224615395bb7a40139b7e8db5e9d5f6a4c04662e78c87085aba9c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Liu, Xiaowen</creatorcontrib><creatorcontrib>Huo, Hongtao</creatorcontrib><creatorcontrib>Yang, Xin</creatorcontrib><creatorcontrib>Li, Jing</creatorcontrib><title>A three-dimensional feature-based fusion strategy for infrared and visible image fusion</title><title>Pattern recognition</title><description>Due to the lacking of attention to the scene’s essential characteristics, the existing fusion methods suffer from the deficiency of scene distortion. In addition, the lack of groundtruth can cause an inadequate representation of vital information. To this end, we propose a novel infrared and visible image fusion network based on three-dimensional feature fusion strategy (D3Fuse). In our method, we consider the scene semantic information in the source images and extract the commonality contents of the two images as the third-dimensional feature to extend the feature space for fusion tasks. Specifically, a commonality feature extraction module (CFEM) is designed to extract the scene commonality features. Subsequently, the scene commonality features are utilized together with modality features to construct the fusion image. Moreover, to ensure the independence and diversity of distinct features, we employ a contrastive learning strategy with multiscale PCA coding, which stretches the feature distance in an unsupervised manner, prompting the encoder to extract more discriminative information without incurring additional parameters and computational costs. Furthermore, a contrastive enhancement strategy is utilized to ensure adequate representation of modality information. The results of the qualitative and quantitative evaluations on the three datasets show that the proposed method has better visual performance and higher objective metrics with lower computational cost. The object detection experiments show that our results have superior performance on high-level semantic tasks. •A three-dimensional feature fusion strategy is proposed to extend the feature space.•We inject scene commonality features into fusion results to enhance visibility.•A contrast learning strategy is utilized to optimize the feature encoding.•A contrast enhancement strategy is adopted for modality information retention.•The experiments demonstrate performance on fusion and advanced vision tasks.</description><subject>Contrastive learning</subject><subject>Convolution neural network</subject><subject>Image fusion</subject><issn>0031-3203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kMtqwzAURLVooWnaP-hCP2BXkiVb3hRC6AsC3bR0KfS4ShUcO0hKIH9fG2ed1cCdO8NwEHqipKSE1s-78qCzHbYlI4yXlBIpxQ1aEFLRomKkukP3Ke0IoQ3lbIF-Vzj_RYDChT30KQy97rAHnY8RCqMTOOyP0xmnHHWG7Rn7IeLQ-6jjaOre4VNIwXSAw15v4fL-gG697hI8XnSJft5ev9cfxebr_XO92hSWSpELqHnjTdtKyxnjNRVVK4xpNCe0ak0D0hkBrRO-1twSXtcMGmllQ6TQRre2WiI-99o4pBTBq0Mcd8SzokRNQNROzUDUBETNQMbYyxyDcdspQFTJBugtuBDBZuWGcL3gH7RNblQ</recordid><startdate>202501</startdate><enddate>202501</enddate><creator>Liu, Xiaowen</creator><creator>Huo, Hongtao</creator><creator>Yang, Xin</creator><creator>Li, Jing</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202501</creationdate><title>A three-dimensional feature-based fusion strategy for infrared and visible image fusion</title><author>Liu, Xiaowen ; Huo, Hongtao ; Yang, Xin ; Li, Jing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c185t-e647fb998c4224615395bb7a40139b7e8db5e9d5f6a4c04662e78c87085aba9c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Contrastive learning</topic><topic>Convolution neural network</topic><topic>Image fusion</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xiaowen</creatorcontrib><creatorcontrib>Huo, Hongtao</creatorcontrib><creatorcontrib>Yang, Xin</creatorcontrib><creatorcontrib>Li, Jing</creatorcontrib><collection>CrossRef</collection><jtitle>Pattern recognition</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Xiaowen</au><au>Huo, Hongtao</au><au>Yang, Xin</au><au>Li, Jing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A three-dimensional feature-based fusion strategy for infrared and visible image fusion</atitle><jtitle>Pattern recognition</jtitle><date>2025-01</date><risdate>2025</risdate><volume>157</volume><spage>110885</spage><pages>110885-</pages><artnum>110885</artnum><issn>0031-3203</issn><abstract>Due to the lacking of attention to the scene’s essential characteristics, the existing fusion methods suffer from the deficiency of scene distortion. In addition, the lack of groundtruth can cause an inadequate representation of vital information. To this end, we propose a novel infrared and visible image fusion network based on three-dimensional feature fusion strategy (D3Fuse). In our method, we consider the scene semantic information in the source images and extract the commonality contents of the two images as the third-dimensional feature to extend the feature space for fusion tasks. Specifically, a commonality feature extraction module (CFEM) is designed to extract the scene commonality features. Subsequently, the scene commonality features are utilized together with modality features to construct the fusion image. Moreover, to ensure the independence and diversity of distinct features, we employ a contrastive learning strategy with multiscale PCA coding, which stretches the feature distance in an unsupervised manner, prompting the encoder to extract more discriminative information without incurring additional parameters and computational costs. Furthermore, a contrastive enhancement strategy is utilized to ensure adequate representation of modality information. The results of the qualitative and quantitative evaluations on the three datasets show that the proposed method has better visual performance and higher objective metrics with lower computational cost. The object detection experiments show that our results have superior performance on high-level semantic tasks. •A three-dimensional feature fusion strategy is proposed to extend the feature space.•We inject scene commonality features into fusion results to enhance visibility.•A contrast learning strategy is utilized to optimize the feature encoding.•A contrast enhancement strategy is adopted for modality information retention.•The experiments demonstrate performance on fusion and advanced vision tasks.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.patcog.2024.110885</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0031-3203
ispartof	Pattern recognition, 2025-01, Vol.157, p.110885, Article 110885
issn	0031-3203
language	eng
recordid	cdi_crossref_primary_10_1016_j_patcog_2024_110885
source	ScienceDirect Journals
subjects	Contrastive learning Convolution neural network Image fusion
title	A three-dimensional feature-based fusion strategy for infrared and visible image fusion
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T15%3A25%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20three-dimensional%20feature-based%20fusion%20strategy%20for%20infrared%20and%20visible%20image%20fusion&rft.jtitle=Pattern%20recognition&rft.au=Liu,%20Xiaowen&rft.date=2025-01&rft.volume=157&rft.spage=110885&rft.pages=110885-&rft.artnum=110885&rft.issn=0031-3203&rft_id=info:doi/10.1016/j.patcog.2024.110885&rft_dat=%3Celsevier_cross%3ES0031320324006368%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c185t-e647fb998c4224615395bb7a40139b7e8db5e9d5f6a4c04662e78c87085aba9c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true