Loading…

Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning

Matching person images between the daytime visible modality and night-time infrared modality (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn the multi-modality features in raw image, ignoring the image-level discrepancy. Some methods apply GAN t...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on information forensics and security 2021, Vol.16, p.728-739
Main Authors:	Ye, Mang, Shen, Jianbing, Shao, Ling
Format:	Article
Language:	English
Subjects:	Cameras Classification Face recognition Gray scale Image color analysis Infrared imagery Machine learning multi-modality Parameter identification Person re-identification (Re-ID) ranking Retrieval Surveillance Task analysis Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633
cites	cdi_FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633
container_end_page	739
container_issue
container_start_page	728
container_title	IEEE transactions on information forensics and security
container_volume	16
creator	Ye, Mang Shen, Jianbing Shao, Ling
description	Matching person images between the daytime visible modality and night-time infrared modality (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn the multi-modality features in raw image, ignoring the image-level discrepancy. Some methods apply GAN technique to generate the cross-modality images, but it destroys the local structure and introduces unavoidable noise. In this paper, we propose a Homogeneous Augmented Tri-Modal (HAT) learning method for VI-ReID, where an auxiliary grayscale modality is generated from their homogeneous visible images, without additional training process. It preserves the structure information of visible images and approximates the image style of infrared modality. Learning with the grayscale visible images enforces the network to mine structure relations across multiple modalities, making it robust to color variations. Specifically, we solve the tri-modal feature learning from both multi-modal classification and multi-view retrieval perspectives. For multi-modal classification, we learn a multi-modality sharing identity classifier with a parameter-sharing network, trained with a homogeneous and heterogeneous identification loss. For multi-view retrieval, we develop a weighted tri-directional ranking loss to optimize the relative distance across multiple modalities. Incorporated with two invariant regularizers, HAT simultaneously minimizes multiple modality variations. In-depth analysis demonstrates the homogeneous grayscale augmentation significantly outperforms the current state-of-the-art by a large margin.
doi_str_mv	10.1109/TIFS.2020.3001665
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2449308216</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9115075</ieee_id><sourcerecordid>2449308216</sourcerecordid><originalsourceid>FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633</originalsourceid><addsrcrecordid>eNo9kEFLAzEQhYMoWKs_QLwseN46k2yS3WMp1hYqitZeQ7qblJR2U5NdwX9vSqWnGYb33jw-Qu4RRohQPS3n088RBQojBoBC8AsyQM5FLoDi5XlHdk1uYtwCFAWKckBWKxfdemfyeWuDDqbJ3k2Ivs0-0qkxbeesq3Xn0uXH6Wzm935jWuP7mI37zT4JkmUZXP7qG73LFkaH1rWbW3Jl9S6au_85JF_T5-Vkli_eXuaT8SKvWYFdXgKXvDaVRJC6Qi05K8o1rWhRWNk0NcqGSltQzkqqZWkkcqBWizW31IJgbEgeT7mH4L97Ezu19X1o00uVMioGJUWRVHhS1cHHGIxVh-D2OvwqBHXEp4741BGf-seXPA8njzPGnPUVpgap5R8pWGpG</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2449308216</pqid></control><display><type>article</type><title>Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Ye, Mang ; Shen, Jianbing ; Shao, Ling</creator><creatorcontrib>Ye, Mang ; Shen, Jianbing ; Shao, Ling</creatorcontrib><description>Matching person images between the daytime visible modality and night-time infrared modality (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn the multi-modality features in raw image, ignoring the image-level discrepancy. Some methods apply GAN technique to generate the cross-modality images, but it destroys the local structure and introduces unavoidable noise. In this paper, we propose a Homogeneous Augmented Tri-Modal (HAT) learning method for VI-ReID, where an auxiliary grayscale modality is generated from their homogeneous visible images, without additional training process. It preserves the structure information of visible images and approximates the image style of infrared modality. Learning with the grayscale visible images enforces the network to mine structure relations across multiple modalities, making it robust to color variations. Specifically, we solve the tri-modal feature learning from both multi-modal classification and multi-view retrieval perspectives. For multi-modal classification, we learn a multi-modality sharing identity classifier with a parameter-sharing network, trained with a homogeneous and heterogeneous identification loss. For multi-view retrieval, we develop a weighted tri-directional ranking loss to optimize the relative distance across multiple modalities. Incorporated with two invariant regularizers, HAT simultaneously minimizes multiple modality variations. In-depth analysis demonstrates the homogeneous grayscale augmentation significantly outperforms the current state-of-the-art by a large margin.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2020.3001665</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Cameras ; Classification ; Face recognition ; Gray scale ; Image color analysis ; Infrared imagery ; Machine learning ; multi-modality ; Parameter identification ; Person re-identification (Re-ID) ; ranking ; Retrieval ; Surveillance ; Task analysis ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2021, Vol.16, p.728-739</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633</citedby><cites>FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633</cites><orcidid>0000-0002-8264-6117 ; 0000-0003-3989-7655 ; 0000-0003-2656-3082</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9115075$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,4024,27923,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Ye, Mang</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><creatorcontrib>Shao, Ling</creatorcontrib><title>Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>Matching person images between the daytime visible modality and night-time infrared modality (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn the multi-modality features in raw image, ignoring the image-level discrepancy. Some methods apply GAN technique to generate the cross-modality images, but it destroys the local structure and introduces unavoidable noise. In this paper, we propose a Homogeneous Augmented Tri-Modal (HAT) learning method for VI-ReID, where an auxiliary grayscale modality is generated from their homogeneous visible images, without additional training process. It preserves the structure information of visible images and approximates the image style of infrared modality. Learning with the grayscale visible images enforces the network to mine structure relations across multiple modalities, making it robust to color variations. Specifically, we solve the tri-modal feature learning from both multi-modal classification and multi-view retrieval perspectives. For multi-modal classification, we learn a multi-modality sharing identity classifier with a parameter-sharing network, trained with a homogeneous and heterogeneous identification loss. For multi-view retrieval, we develop a weighted tri-directional ranking loss to optimize the relative distance across multiple modalities. Incorporated with two invariant regularizers, HAT simultaneously minimizes multiple modality variations. In-depth analysis demonstrates the homogeneous grayscale augmentation significantly outperforms the current state-of-the-art by a large margin.</description><subject>Cameras</subject><subject>Classification</subject><subject>Face recognition</subject><subject>Gray scale</subject><subject>Image color analysis</subject><subject>Infrared imagery</subject><subject>Machine learning</subject><subject>multi-modality</subject><subject>Parameter identification</subject><subject>Person re-identification (Re-ID)</subject><subject>ranking</subject><subject>Retrieval</subject><subject>Surveillance</subject><subject>Task analysis</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kEFLAzEQhYMoWKs_QLwseN46k2yS3WMp1hYqitZeQ7qblJR2U5NdwX9vSqWnGYb33jw-Qu4RRohQPS3n088RBQojBoBC8AsyQM5FLoDi5XlHdk1uYtwCFAWKckBWKxfdemfyeWuDDqbJ3k2Ivs0-0qkxbeesq3Xn0uXH6Wzm935jWuP7mI37zT4JkmUZXP7qG73LFkaH1rWbW3Jl9S6au_85JF_T5-Vkli_eXuaT8SKvWYFdXgKXvDaVRJC6Qi05K8o1rWhRWNk0NcqGSltQzkqqZWkkcqBWizW31IJgbEgeT7mH4L97Ezu19X1o00uVMioGJUWRVHhS1cHHGIxVh-D2OvwqBHXEp4741BGf-seXPA8njzPGnPUVpgap5R8pWGpG</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Ye, Mang</creator><creator>Shen, Jianbing</creator><creator>Shao, Ling</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8264-6117</orcidid><orcidid>https://orcid.org/0000-0003-3989-7655</orcidid><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid></search><sort><creationdate>2021</creationdate><title>Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning</title><author>Ye, Mang ; Shen, Jianbing ; Shao, Ling</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Cameras</topic><topic>Classification</topic><topic>Face recognition</topic><topic>Gray scale</topic><topic>Image color analysis</topic><topic>Infrared imagery</topic><topic>Machine learning</topic><topic>multi-modality</topic><topic>Parameter identification</topic><topic>Person re-identification (Re-ID)</topic><topic>ranking</topic><topic>Retrieval</topic><topic>Surveillance</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ye, Mang</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><creatorcontrib>Shao, Ling</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ye, Mang</au><au>Shen, Jianbing</au><au>Shao, Ling</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2021</date><risdate>2021</risdate><volume>16</volume><spage>728</spage><epage>739</epage><pages>728-739</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>Matching person images between the daytime visible modality and night-time infrared modality (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Existing methods usually learn the multi-modality features in raw image, ignoring the image-level discrepancy. Some methods apply GAN technique to generate the cross-modality images, but it destroys the local structure and introduces unavoidable noise. In this paper, we propose a Homogeneous Augmented Tri-Modal (HAT) learning method for VI-ReID, where an auxiliary grayscale modality is generated from their homogeneous visible images, without additional training process. It preserves the structure information of visible images and approximates the image style of infrared modality. Learning with the grayscale visible images enforces the network to mine structure relations across multiple modalities, making it robust to color variations. Specifically, we solve the tri-modal feature learning from both multi-modal classification and multi-view retrieval perspectives. For multi-modal classification, we learn a multi-modality sharing identity classifier with a parameter-sharing network, trained with a homogeneous and heterogeneous identification loss. For multi-view retrieval, we develop a weighted tri-directional ranking loss to optimize the relative distance across multiple modalities. Incorporated with two invariant regularizers, HAT simultaneously minimizes multiple modality variations. In-depth analysis demonstrates the homogeneous grayscale augmentation significantly outperforms the current state-of-the-art by a large margin.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2020.3001665</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-8264-6117</orcidid><orcidid>https://orcid.org/0000-0003-3989-7655</orcidid><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1556-6013
ispartof	IEEE transactions on information forensics and security, 2021, Vol.16, p.728-739
issn	1556-6013 1556-6021
language	eng
recordid	cdi_proquest_journals_2449308216
source	IEEE Electronic Library (IEL) Journals
subjects	Cameras Classification Face recognition Gray scale Image color analysis Infrared imagery Machine learning multi-modality Parameter identification Person re-identification (Re-ID) ranking Retrieval Surveillance Task analysis Training
title	Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T00%3A49%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visible-Infrared%20Person%20Re-Identification%20via%20Homogeneous%20Augmented%20Tri-Modal%20Learning&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Ye,%20Mang&rft.date=2021&rft.volume=16&rft.spage=728&rft.epage=739&rft.pages=728-739&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2020.3001665&rft_dat=%3Cproquest_ieee_%3E2449308216%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c341t-80575ce97107a91a75348b29244f7ddc17d27f425382a78e71502fa6b5f2f0633%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2449308216&rft_id=info:pmid/&rft_ieee_id=9115075&rfr_iscdi=true