Loading…

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual informat...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ... AAAI Conference on Artificial Intelligence 2021-05, Vol.35 (2), p.1291-1299
Main Authors:	Fang, Hao-Shu, Xie, Yichen, Shao, Dian, Lu, Cewu
Format:	Article
Language:	English
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c215t-12503d8db2ebcfb68bfd902af3bed8e85242512e6d0fc8511f21240f48391ad93
cites
container_end_page	1299
container_issue	2
container_start_page	1291
container_title	Proceedings of the ... AAAI Conference on Artificial Intelligence
container_volume	35
creator	Fang, Hao-Shu Xie, Yichen Shao, Dian Lu, Cewu
description	Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. Our code is publicly available at www.github.com/MVIG-SJTU/DIRV.
doi_str_mv	10.1609/aaai.v35i2.16217
format	article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v35i2_16217</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v35i2_16217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c215t-12503d8db2ebcfb68bfd902af3bed8e85242512e6d0fc8511f21240f48391ad93</originalsourceid><addsrcrecordid>eNpVkEFLwzAUx4MoOObuHvsFMvOSpk28yTpdYTAYOo8hbV5Gh0uljYLf3rbz4rv8_v_Dezx-hNwDW0LG9IO1tll-C9nwoXPIr8iMizylIs3U9ZBBaiqF1rdk0fcnNkyqASCfkfei3B8ekwJDj0kZIna2jk0bkj0eRxza2IRj4tsuWQdHY0sHJJuvsw10V52wjv-2Cow4pTty4-1Hj4s_zsnb8_p1taHb3Uu5etrSengqUuCSCadcxbGqfZWpyjvNuPWiQqdQSZ5yCRwzx3ytJIDnwFPmUyU0WKfFnLDL3bpr-75Dbz675my7HwPMjG7M6MZMbszkRvwCniBYXg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection</title><source>Freely Accessible Science Journals - check A-Z of ejournals</source><creator>Fang, Hao-Shu ; Xie, Yichen ; Shao, Dian ; Lu, Cewu</creator><creatorcontrib>Fang, Hao-Shu ; Xie, Yichen ; Shao, Dian ; Lu, Cewu</creatorcontrib><description>Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. Our code is publicly available at www.github.com/MVIG-SJTU/DIRV.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v35i2.16217</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021-05, Vol.35 (2), p.1291-1299</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c215t-12503d8db2ebcfb68bfd902af3bed8e85242512e6d0fc8511f21240f48391ad93</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Fang, Hao-Shu</creatorcontrib><creatorcontrib>Xie, Yichen</creatorcontrib><creatorcontrib>Shao, Dian</creatorcontrib><creatorcontrib>Lu, Cewu</creatorcontrib><title>DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. Our code is publicly available at www.github.com/MVIG-SJTU/DIRV.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpVkEFLwzAUx4MoOObuHvsFMvOSpk28yTpdYTAYOo8hbV5Gh0uljYLf3rbz4rv8_v_Dezx-hNwDW0LG9IO1tll-C9nwoXPIr8iMizylIs3U9ZBBaiqF1rdk0fcnNkyqASCfkfei3B8ekwJDj0kZIna2jk0bkj0eRxza2IRj4tsuWQdHY0sHJJuvsw10V52wjv-2Cow4pTty4-1Hj4s_zsnb8_p1taHb3Uu5etrSengqUuCSCadcxbGqfZWpyjvNuPWiQqdQSZ5yCRwzx3ytJIDnwFPmUyU0WKfFnLDL3bpr-75Dbz675my7HwPMjG7M6MZMbszkRvwCniBYXg</recordid><startdate>20210518</startdate><enddate>20210518</enddate><creator>Fang, Hao-Shu</creator><creator>Xie, Yichen</creator><creator>Shao, Dian</creator><creator>Lu, Cewu</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210518</creationdate><title>DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection</title><author>Fang, Hao-Shu ; Xie, Yichen ; Shao, Dian ; Lu, Cewu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c215t-12503d8db2ebcfb68bfd902af3bed8e85242512e6d0fc8511f21240f48391ad93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Fang, Hao-Shu</creatorcontrib><creatorcontrib>Xie, Yichen</creatorcontrib><creatorcontrib>Shao, Dian</creatorcontrib><creatorcontrib>Lu, Cewu</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Hao-Shu</au><au>Xie, Yichen</au><au>Shao, Dian</au><au>Lu, Cewu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection</atitle><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle><date>2021-05-18</date><risdate>2021</risdate><volume>35</volume><issue>2</issue><spage>1291</spage><epage>1299</epage><pages>1291-1299</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair, so as to capture the subtle visual features that is most essential to the interaction. Moreover, in order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy that makes full use of those overlapped interaction regions in place of conventional Non-Maximal Suppression (NMS). Extensive experiments on two popular benchmarks: V-COCO and HICO-DET show that our approach outperforms existing state-of-the-arts by a large margin with the highest inference speed and lightest network architecture. Our code is publicly available at www.github.com/MVIG-SJTU/DIRV.</abstract><doi>10.1609/aaai.v35i2.16217</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2159-5399
ispartof	Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021-05, Vol.35 (2), p.1291-1299
issn	2159-5399 2374-3468
language	eng
recordid	cdi_crossref_primary_10_1609_aaai_v35i2_16217
source	Freely Accessible Science Journals - check A-Z of ejournals
title	DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T22%3A23%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DIRV:%20Dense%20Interaction%20Region%20Voting%20for%20End-to-End%20Human-Object%20Interaction%20Detection&rft.jtitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Fang,%20Hao-Shu&rft.date=2021-05-18&rft.volume=35&rft.issue=2&rft.spage=1291&rft.epage=1299&rft.pages=1291-1299&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v35i2.16217&rft_dat=%3Ccrossref%3E10_1609_aaai_v35i2_16217%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c215t-12503d8db2ebcfb68bfd902af3bed8e85242512e6d0fc8511f21240f48391ad93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true