Loading…
Detecting conversational groups in images and sequences: A robust game-theoretic approach
•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled fr...
Saved in:
Published in: | Computer vision and image understanding 2016-02, Vol.143, p.11-24 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3 |
---|---|
cites | cdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3 |
container_end_page | 24 |
container_issue | |
container_start_page | 11 |
container_title | Computer vision and image understanding |
container_volume | 143 |
creator | Vascon, Sebastiano Mequanint, Eyasu Z. Cristani, Marco Hung, Hayley Pelillo, Marcello Murino, Vittorio |
description | •A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art.
Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions.
[Display omitted] |
doi_str_mv | 10.1016/j.cviu.2015.09.012 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1793254965</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1077314215002076</els_id><sourcerecordid>1793254965</sourcerecordid><originalsourceid>FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</originalsourceid><addsrcrecordid>eNp9kD9PwzAQxS0EEqXwBZg8siTYThzHiKUqf6VKLCDBZDn2JXWVxsFOKvHtSVRmprvhvXf3fghdU5JSQovbXWoObkwZoTwlMiWUnaAFJZIkLOOfp_MuRJLRnJ2jixh3hFCaS7pAXw8wgBlc12DjuwOEqAfnO93iJvixj9h12O11AxHrzuII3yN0BuIdXuHgqzEOuNF7SIYt-ACDM1j3ffDabC_RWa3bCFd_c4k-nh7f1y_J5u35db3aJCYTYkiELEmma2F4aXJGiNYkt2XJORQllZVlphKClyXjVhhbSFkVOjdgJZFFlZk6W6KbY-50dnouDmrvooG21R34MSoqZMZ4Lgs-SdlRaoKPMUCt-jCVCz-KEjVzVDs1c1QzR0WkmjhOpvujCaYSBwdBReNmBtaFiZyy3v1n_wX6eX0Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1793254965</pqid></control><display><type>article</type><title>Detecting conversational groups in images and sequences: A robust game-theoretic approach</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Vascon, Sebastiano ; Mequanint, Eyasu Z. ; Cristani, Marco ; Hung, Hayley ; Pelillo, Marcello ; Murino, Vittorio</creator><creatorcontrib>Vascon, Sebastiano ; Mequanint, Eyasu Z. ; Cristani, Marco ; Hung, Hayley ; Pelillo, Marcello ; Murino, Vittorio</creatorcontrib><description>•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art.
Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions.
[Display omitted]</description><identifier>ISSN: 1077-3142</identifier><identifier>EISSN: 1090-235X</identifier><identifier>DOI: 10.1016/j.cviu.2015.09.012</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Agglomeration ; Classification ; Computer vision ; Conversational groups ; F-formation detection ; Game theory ; Group detection ; Mathematical models ; Scene understanding ; State of the art ; Video data</subject><ispartof>Computer vision and image understanding, 2016-02, Vol.143, p.11-24</ispartof><rights>2015 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</citedby><cites>FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</cites><orcidid>0000-0002-7855-1641 ; 0000-0002-8645-2328</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Vascon, Sebastiano</creatorcontrib><creatorcontrib>Mequanint, Eyasu Z.</creatorcontrib><creatorcontrib>Cristani, Marco</creatorcontrib><creatorcontrib>Hung, Hayley</creatorcontrib><creatorcontrib>Pelillo, Marcello</creatorcontrib><creatorcontrib>Murino, Vittorio</creatorcontrib><title>Detecting conversational groups in images and sequences: A robust game-theoretic approach</title><title>Computer vision and image understanding</title><description>•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art.
Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions.
[Display omitted]</description><subject>Agglomeration</subject><subject>Classification</subject><subject>Computer vision</subject><subject>Conversational groups</subject><subject>F-formation detection</subject><subject>Game theory</subject><subject>Group detection</subject><subject>Mathematical models</subject><subject>Scene understanding</subject><subject>State of the art</subject><subject>Video data</subject><issn>1077-3142</issn><issn>1090-235X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kD9PwzAQxS0EEqXwBZg8siTYThzHiKUqf6VKLCDBZDn2JXWVxsFOKvHtSVRmprvhvXf3fghdU5JSQovbXWoObkwZoTwlMiWUnaAFJZIkLOOfp_MuRJLRnJ2jixh3hFCaS7pAXw8wgBlc12DjuwOEqAfnO93iJvixj9h12O11AxHrzuII3yN0BuIdXuHgqzEOuNF7SIYt-ACDM1j3ffDabC_RWa3bCFd_c4k-nh7f1y_J5u35db3aJCYTYkiELEmma2F4aXJGiNYkt2XJORQllZVlphKClyXjVhhbSFkVOjdgJZFFlZk6W6KbY-50dnouDmrvooG21R34MSoqZMZ4Lgs-SdlRaoKPMUCt-jCVCz-KEjVzVDs1c1QzR0WkmjhOpvujCaYSBwdBReNmBtaFiZyy3v1n_wX6eX0Q</recordid><startdate>201602</startdate><enddate>201602</enddate><creator>Vascon, Sebastiano</creator><creator>Mequanint, Eyasu Z.</creator><creator>Cristani, Marco</creator><creator>Hung, Hayley</creator><creator>Pelillo, Marcello</creator><creator>Murino, Vittorio</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7855-1641</orcidid><orcidid>https://orcid.org/0000-0002-8645-2328</orcidid></search><sort><creationdate>201602</creationdate><title>Detecting conversational groups in images and sequences: A robust game-theoretic approach</title><author>Vascon, Sebastiano ; Mequanint, Eyasu Z. ; Cristani, Marco ; Hung, Hayley ; Pelillo, Marcello ; Murino, Vittorio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Agglomeration</topic><topic>Classification</topic><topic>Computer vision</topic><topic>Conversational groups</topic><topic>F-formation detection</topic><topic>Game theory</topic><topic>Group detection</topic><topic>Mathematical models</topic><topic>Scene understanding</topic><topic>State of the art</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vascon, Sebastiano</creatorcontrib><creatorcontrib>Mequanint, Eyasu Z.</creatorcontrib><creatorcontrib>Cristani, Marco</creatorcontrib><creatorcontrib>Hung, Hayley</creatorcontrib><creatorcontrib>Pelillo, Marcello</creatorcontrib><creatorcontrib>Murino, Vittorio</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer vision and image understanding</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vascon, Sebastiano</au><au>Mequanint, Eyasu Z.</au><au>Cristani, Marco</au><au>Hung, Hayley</au><au>Pelillo, Marcello</au><au>Murino, Vittorio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detecting conversational groups in images and sequences: A robust game-theoretic approach</atitle><jtitle>Computer vision and image understanding</jtitle><date>2016-02</date><risdate>2016</risdate><volume>143</volume><spage>11</spage><epage>24</epage><pages>11-24</pages><issn>1077-3142</issn><eissn>1090-235X</eissn><abstract>•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art.
Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions.
[Display omitted]</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.cviu.2015.09.012</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-7855-1641</orcidid><orcidid>https://orcid.org/0000-0002-8645-2328</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1077-3142 |
ispartof | Computer vision and image understanding, 2016-02, Vol.143, p.11-24 |
issn | 1077-3142 1090-235X |
language | eng |
recordid | cdi_proquest_miscellaneous_1793254965 |
source | ScienceDirect Freedom Collection 2022-2024 |
subjects | Agglomeration Classification Computer vision Conversational groups F-formation detection Game theory Group detection Mathematical models Scene understanding State of the art Video data |
title | Detecting conversational groups in images and sequences: A robust game-theoretic approach |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T07%3A32%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detecting%20conversational%20groups%20in%20images%20and%20sequences:%20A%20robust%20game-theoretic%20approach&rft.jtitle=Computer%20vision%20and%20image%20understanding&rft.au=Vascon,%20Sebastiano&rft.date=2016-02&rft.volume=143&rft.spage=11&rft.epage=24&rft.pages=11-24&rft.issn=1077-3142&rft.eissn=1090-235X&rft_id=info:doi/10.1016/j.cviu.2015.09.012&rft_dat=%3Cproquest_cross%3E1793254965%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1793254965&rft_id=info:pmid/&rfr_iscdi=true |