Loading…

Learning a cross-modal hashing network for multimedia search

In this paper, we propose a cross-modal hashing network (CMHN) method to learn compact binary codes for cross-modality multimedia search. Unlike most existing cross-modal hashing methods which learn a single pair of projections to map each example into a binary vector, we design a deep neural networ...

Full description

Saved in:

Bibliographic Details
Main Authors:	Liong, Venice Erin, Lu, Jiwen, Tan, Yap-Peng
Format:	Conference Proceeding
Language:	English
Subjects:	Benchmark testing binary code learning Binary codes cross-modal retrieval hashing Multimedia communication Optimization Quantization (signal) Semantics Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	3704
container_issue
container_start_page	3700
container_title
container_volume
creator	Liong, Venice Erin Lu, Jiwen Tan, Yap-Peng
description	In this paper, we propose a cross-modal hashing network (CMHN) method to learn compact binary codes for cross-modality multimedia search. Unlike most existing cross-modal hashing methods which learn a single pair of projections to map each example into a binary vector, we design a deep neural network to learn multiple pairs of hierarchical non-linear transformations, under which the nonlinear characteristics of samples can be well exploited and the modality gap is well reduced. Our model is trained under an iterative optimization procedure which learns a (1) unified binary code discretely and discriminatively through a classification-based hinge-loss criterion, and (2) cross-modal hashing network, one deep network for each modality, through minimizing the quantization loss between real-valued neural code and binary code, and maximizing the variance of the learned neural codes. Experimental results on two benchmark datasets show the efficacy of the proposed approach.
doi_str_mv	10.1109/ICIP.2017.8296973
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8296973</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8296973</ieee_id><sourcerecordid>8296973</sourcerecordid><originalsourceid>FETCH-LOGICAL-i218t-f4d8e8d8c2b3aa963954862e9fe893c8af39c5687d37ec13dccbfca7db2bb1c3</originalsourceid><addsrcrecordid>eNotj8tKw0AYhUehYG37AOJmXiBxLknm_8GNBK2BgC66L5O5mNFcZCYivr0Vuzpw-M4Hh5AbznLOGd41dfOaC8ZVDgIrVPKC7FABLxkywVUJl2QtJPAMygKvyHVK74ydeMnX5L51Ok5heqOamjinlI2z1QPtder_2skt33P8oH6OdPwaljA6GzRNp5Xpt2Tl9ZDc7pwbcnh6PNTPWfuyb-qHNguCw5L5woIDC0Z0UmusJJYFVMKhd4DSgPYSTVmBslI5w6U1pvNGK9uJruNGbsjtvzY4546fMYw6_hzPX-UvjgNJMA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Learning a cross-modal hashing network for multimedia search</title><source>IEEE Xplore All Conference Series</source><creator>Liong, Venice Erin ; Lu, Jiwen ; Tan, Yap-Peng</creator><creatorcontrib>Liong, Venice Erin ; Lu, Jiwen ; Tan, Yap-Peng</creatorcontrib><description>In this paper, we propose a cross-modal hashing network (CMHN) method to learn compact binary codes for cross-modality multimedia search. Unlike most existing cross-modal hashing methods which learn a single pair of projections to map each example into a binary vector, we design a deep neural network to learn multiple pairs of hierarchical non-linear transformations, under which the nonlinear characteristics of samples can be well exploited and the modality gap is well reduced. Our model is trained under an iterative optimization procedure which learns a (1) unified binary code discretely and discriminatively through a classification-based hinge-loss criterion, and (2) cross-modal hashing network, one deep network for each modality, through minimizing the quantization loss between real-valued neural code and binary code, and maximizing the variance of the learned neural codes. Experimental results on two benchmark datasets show the efficacy of the proposed approach.</description><identifier>EISSN: 2381-8549</identifier><identifier>EISBN: 9781509021758</identifier><identifier>EISBN: 1509021752</identifier><identifier>DOI: 10.1109/ICIP.2017.8296973</identifier><language>eng</language><publisher>IEEE</publisher><subject>Benchmark testing ; binary code learning ; Binary codes ; cross-modal retrieval ; hashing ; Multimedia communication ; Optimization ; Quantization (signal) ; Semantics ; Training</subject><ispartof>2017 IEEE International Conference on Image Processing (ICIP), 2017, p.3700-3704</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8296973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8296973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liong, Venice Erin</creatorcontrib><creatorcontrib>Lu, Jiwen</creatorcontrib><creatorcontrib>Tan, Yap-Peng</creatorcontrib><title>Learning a cross-modal hashing network for multimedia search</title><title>2017 IEEE International Conference on Image Processing (ICIP)</title><addtitle>ICIP</addtitle><description>In this paper, we propose a cross-modal hashing network (CMHN) method to learn compact binary codes for cross-modality multimedia search. Unlike most existing cross-modal hashing methods which learn a single pair of projections to map each example into a binary vector, we design a deep neural network to learn multiple pairs of hierarchical non-linear transformations, under which the nonlinear characteristics of samples can be well exploited and the modality gap is well reduced. Our model is trained under an iterative optimization procedure which learns a (1) unified binary code discretely and discriminatively through a classification-based hinge-loss criterion, and (2) cross-modal hashing network, one deep network for each modality, through minimizing the quantization loss between real-valued neural code and binary code, and maximizing the variance of the learned neural codes. Experimental results on two benchmark datasets show the efficacy of the proposed approach.</description><subject>Benchmark testing</subject><subject>binary code learning</subject><subject>Binary codes</subject><subject>cross-modal retrieval</subject><subject>hashing</subject><subject>Multimedia communication</subject><subject>Optimization</subject><subject>Quantization (signal)</subject><subject>Semantics</subject><subject>Training</subject><issn>2381-8549</issn><isbn>9781509021758</isbn><isbn>1509021752</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2017</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8tKw0AYhUehYG37AOJmXiBxLknm_8GNBK2BgC66L5O5mNFcZCYivr0Vuzpw-M4Hh5AbznLOGd41dfOaC8ZVDgIrVPKC7FABLxkywVUJl2QtJPAMygKvyHVK74ydeMnX5L51Ok5heqOamjinlI2z1QPtder_2skt33P8oH6OdPwaljA6GzRNp5Xpt2Tl9ZDc7pwbcnh6PNTPWfuyb-qHNguCw5L5woIDC0Z0UmusJJYFVMKhd4DSgPYSTVmBslI5w6U1pvNGK9uJruNGbsjtvzY4546fMYw6_hzPX-UvjgNJMA</recordid><startdate>201709</startdate><enddate>201709</enddate><creator>Liong, Venice Erin</creator><creator>Lu, Jiwen</creator><creator>Tan, Yap-Peng</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201709</creationdate><title>Learning a cross-modal hashing network for multimedia search</title><author>Liong, Venice Erin ; Lu, Jiwen ; Tan, Yap-Peng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i218t-f4d8e8d8c2b3aa963954862e9fe893c8af39c5687d37ec13dccbfca7db2bb1c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Benchmark testing</topic><topic>binary code learning</topic><topic>Binary codes</topic><topic>cross-modal retrieval</topic><topic>hashing</topic><topic>Multimedia communication</topic><topic>Optimization</topic><topic>Quantization (signal)</topic><topic>Semantics</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Liong, Venice Erin</creatorcontrib><creatorcontrib>Lu, Jiwen</creatorcontrib><creatorcontrib>Tan, Yap-Peng</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liong, Venice Erin</au><au>Lu, Jiwen</au><au>Tan, Yap-Peng</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Learning a cross-modal hashing network for multimedia search</atitle><btitle>2017 IEEE International Conference on Image Processing (ICIP)</btitle><stitle>ICIP</stitle><date>2017-09</date><risdate>2017</risdate><spage>3700</spage><epage>3704</epage><pages>3700-3704</pages><eissn>2381-8549</eissn><eisbn>9781509021758</eisbn><eisbn>1509021752</eisbn><abstract>In this paper, we propose a cross-modal hashing network (CMHN) method to learn compact binary codes for cross-modality multimedia search. Unlike most existing cross-modal hashing methods which learn a single pair of projections to map each example into a binary vector, we design a deep neural network to learn multiple pairs of hierarchical non-linear transformations, under which the nonlinear characteristics of samples can be well exploited and the modality gap is well reduced. Our model is trained under an iterative optimization procedure which learns a (1) unified binary code discretely and discriminatively through a classification-based hinge-loss criterion, and (2) cross-modal hashing network, one deep network for each modality, through minimizing the quantization loss between real-valued neural code and binary code, and maximizing the variance of the learned neural codes. Experimental results on two benchmark datasets show the efficacy of the proposed approach.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2017.8296973</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2381-8549
ispartof	2017 IEEE International Conference on Image Processing (ICIP), 2017, p.3700-3704
issn	2381-8549
language	eng
recordid	cdi_ieee_primary_8296973
source	IEEE Xplore All Conference Series
subjects	Benchmark testing binary code learning Binary codes cross-modal retrieval hashing Multimedia communication Optimization Quantization (signal) Semantics Training
title	Learning a cross-modal hashing network for multimedia search
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T04%3A15%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Learning%20a%20cross-modal%20hashing%20network%20for%20multimedia%20search&rft.btitle=2017%20IEEE%20International%20Conference%20on%20Image%20Processing%20(ICIP)&rft.au=Liong,%20Venice%20Erin&rft.date=2017-09&rft.spage=3700&rft.epage=3704&rft.pages=3700-3704&rft.eissn=2381-8549&rft_id=info:doi/10.1109/ICIP.2017.8296973&rft.eisbn=9781509021758&rft.eisbn_list=1509021752&rft_dat=%3Cieee_CHZPO%3E8296973%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i218t-f4d8e8d8c2b3aa963954862e9fe893c8af39c5687d37ec13dccbfca7db2bb1c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8296973&rfr_iscdi=true