Loading…

Detecting conversational groups in images and sequences: A robust game-theoretic approach

•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled fr...

Full description

Saved in:
Bibliographic Details
Published in:Computer vision and image understanding 2016-02, Vol.143, p.11-24
Main Authors: Vascon, Sebastiano, Mequanint, Eyasu Z., Cristani, Marco, Hung, Hayley, Pelillo, Marcello, Murino, Vittorio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3
cites cdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3
container_end_page 24
container_issue
container_start_page 11
container_title Computer vision and image understanding
container_volume 143
creator Vascon, Sebastiano
Mequanint, Eyasu Z.
Cristani, Marco
Hung, Hayley
Pelillo, Marcello
Murino, Vittorio
description •A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art. Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions. [Display omitted]
doi_str_mv 10.1016/j.cviu.2015.09.012
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1793254965</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1077314215002076</els_id><sourcerecordid>1793254965</sourcerecordid><originalsourceid>FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</originalsourceid><addsrcrecordid>eNp9kD9PwzAQxS0EEqXwBZg8siTYThzHiKUqf6VKLCDBZDn2JXWVxsFOKvHtSVRmprvhvXf3fghdU5JSQovbXWoObkwZoTwlMiWUnaAFJZIkLOOfp_MuRJLRnJ2jixh3hFCaS7pAXw8wgBlc12DjuwOEqAfnO93iJvixj9h12O11AxHrzuII3yN0BuIdXuHgqzEOuNF7SIYt-ACDM1j3ffDabC_RWa3bCFd_c4k-nh7f1y_J5u35db3aJCYTYkiELEmma2F4aXJGiNYkt2XJORQllZVlphKClyXjVhhbSFkVOjdgJZFFlZk6W6KbY-50dnouDmrvooG21R34MSoqZMZ4Lgs-SdlRaoKPMUCt-jCVCz-KEjVzVDs1c1QzR0WkmjhOpvujCaYSBwdBReNmBtaFiZyy3v1n_wX6eX0Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1793254965</pqid></control><display><type>article</type><title>Detecting conversational groups in images and sequences: A robust game-theoretic approach</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Vascon, Sebastiano ; Mequanint, Eyasu Z. ; Cristani, Marco ; Hung, Hayley ; Pelillo, Marcello ; Murino, Vittorio</creator><creatorcontrib>Vascon, Sebastiano ; Mequanint, Eyasu Z. ; Cristani, Marco ; Hung, Hayley ; Pelillo, Marcello ; Murino, Vittorio</creatorcontrib><description>•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art. Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions. [Display omitted]</description><identifier>ISSN: 1077-3142</identifier><identifier>EISSN: 1090-235X</identifier><identifier>DOI: 10.1016/j.cviu.2015.09.012</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Agglomeration ; Classification ; Computer vision ; Conversational groups ; F-formation detection ; Game theory ; Group detection ; Mathematical models ; Scene understanding ; State of the art ; Video data</subject><ispartof>Computer vision and image understanding, 2016-02, Vol.143, p.11-24</ispartof><rights>2015 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</citedby><cites>FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</cites><orcidid>0000-0002-7855-1641 ; 0000-0002-8645-2328</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Vascon, Sebastiano</creatorcontrib><creatorcontrib>Mequanint, Eyasu Z.</creatorcontrib><creatorcontrib>Cristani, Marco</creatorcontrib><creatorcontrib>Hung, Hayley</creatorcontrib><creatorcontrib>Pelillo, Marcello</creatorcontrib><creatorcontrib>Murino, Vittorio</creatorcontrib><title>Detecting conversational groups in images and sequences: A robust game-theoretic approach</title><title>Computer vision and image understanding</title><description>•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art. Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions. [Display omitted]</description><subject>Agglomeration</subject><subject>Classification</subject><subject>Computer vision</subject><subject>Conversational groups</subject><subject>F-formation detection</subject><subject>Game theory</subject><subject>Group detection</subject><subject>Mathematical models</subject><subject>Scene understanding</subject><subject>State of the art</subject><subject>Video data</subject><issn>1077-3142</issn><issn>1090-235X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kD9PwzAQxS0EEqXwBZg8siTYThzHiKUqf6VKLCDBZDn2JXWVxsFOKvHtSVRmprvhvXf3fghdU5JSQovbXWoObkwZoTwlMiWUnaAFJZIkLOOfp_MuRJLRnJ2jixh3hFCaS7pAXw8wgBlc12DjuwOEqAfnO93iJvixj9h12O11AxHrzuII3yN0BuIdXuHgqzEOuNF7SIYt-ACDM1j3ffDabC_RWa3bCFd_c4k-nh7f1y_J5u35db3aJCYTYkiELEmma2F4aXJGiNYkt2XJORQllZVlphKClyXjVhhbSFkVOjdgJZFFlZk6W6KbY-50dnouDmrvooG21R34MSoqZMZ4Lgs-SdlRaoKPMUCt-jCVCz-KEjVzVDs1c1QzR0WkmjhOpvujCaYSBwdBReNmBtaFiZyy3v1n_wX6eX0Q</recordid><startdate>201602</startdate><enddate>201602</enddate><creator>Vascon, Sebastiano</creator><creator>Mequanint, Eyasu Z.</creator><creator>Cristani, Marco</creator><creator>Hung, Hayley</creator><creator>Pelillo, Marcello</creator><creator>Murino, Vittorio</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7855-1641</orcidid><orcidid>https://orcid.org/0000-0002-8645-2328</orcidid></search><sort><creationdate>201602</creationdate><title>Detecting conversational groups in images and sequences: A robust game-theoretic approach</title><author>Vascon, Sebastiano ; Mequanint, Eyasu Z. ; Cristani, Marco ; Hung, Hayley ; Pelillo, Marcello ; Murino, Vittorio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Agglomeration</topic><topic>Classification</topic><topic>Computer vision</topic><topic>Conversational groups</topic><topic>F-formation detection</topic><topic>Game theory</topic><topic>Group detection</topic><topic>Mathematical models</topic><topic>Scene understanding</topic><topic>State of the art</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vascon, Sebastiano</creatorcontrib><creatorcontrib>Mequanint, Eyasu Z.</creatorcontrib><creatorcontrib>Cristani, Marco</creatorcontrib><creatorcontrib>Hung, Hayley</creatorcontrib><creatorcontrib>Pelillo, Marcello</creatorcontrib><creatorcontrib>Murino, Vittorio</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer vision and image understanding</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vascon, Sebastiano</au><au>Mequanint, Eyasu Z.</au><au>Cristani, Marco</au><au>Hung, Hayley</au><au>Pelillo, Marcello</au><au>Murino, Vittorio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detecting conversational groups in images and sequences: A robust game-theoretic approach</atitle><jtitle>Computer vision and image understanding</jtitle><date>2016-02</date><risdate>2016</risdate><volume>143</volume><spage>11</spage><epage>24</epage><pages>11-24</pages><issn>1077-3142</issn><eissn>1090-235X</eissn><abstract>•A game-theoretic approach for group detection in still images and video.•Extended (game) theory to integrate temporal information and data continuity.•A new model of frustum of visual attention which achieves better performance.•A new annotated dataset for group detection including 10685 labeled frames.•Performance evaluation on all public datasets outperforming the state of the art. Detecting groups is becoming of relevant interest as an important step for scene (and especially activity) understanding. Differently from what is commonly assumed in the computer vision community, different types of groups do exist, and among these, standing conversational groups (a.k.a. F-formations) play an important role. An F-formation is a common type of people aggregation occurring when two or more persons sustain a social interaction, such as a chat at a cocktail party. Indeed, detecting and subsequently classifying such an interaction in images or videos is of considerable importance in many applicative contexts, like surveillance, social signal processing, social robotics or activity classification, to name a few. This paper presents a principled method to approach to this problem grounded upon the socio-psychological concept of an F-formation. More specifically, a game-theoretic framework is proposed, aimed at modeling the spatial structure characterizing F-formations. In other words, since F-formations are subject to geometrical configurations on how humans have to be mutually located and oriented, the proposed solution is able to account for these constraints while also statistically modeling the uncertainty associated with the position and orientation of the engaged persons. Moreover, taking advantage of video data, it is also able to integrate temporal information over multiple frames utilizing the recent notions from multi-payoff evolutionary game theory. The experiments have been performed on several benchmark datasets, consistently showing the superiority of the proposed approach over the state of the art, and its robustness under severe noise conditions. [Display omitted]</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.cviu.2015.09.012</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-7855-1641</orcidid><orcidid>https://orcid.org/0000-0002-8645-2328</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1077-3142
ispartof Computer vision and image understanding, 2016-02, Vol.143, p.11-24
issn 1077-3142
1090-235X
language eng
recordid cdi_proquest_miscellaneous_1793254965
source ScienceDirect Freedom Collection 2022-2024
subjects Agglomeration
Classification
Computer vision
Conversational groups
F-formation detection
Game theory
Group detection
Mathematical models
Scene understanding
State of the art
Video data
title Detecting conversational groups in images and sequences: A robust game-theoretic approach
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T07%3A32%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detecting%20conversational%20groups%20in%20images%20and%20sequences:%20A%20robust%20game-theoretic%20approach&rft.jtitle=Computer%20vision%20and%20image%20understanding&rft.au=Vascon,%20Sebastiano&rft.date=2016-02&rft.volume=143&rft.spage=11&rft.epage=24&rft.pages=11-24&rft.issn=1077-3142&rft.eissn=1090-235X&rft_id=info:doi/10.1016/j.cviu.2015.09.012&rft_dat=%3Cproquest_cross%3E1793254965%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c377t-79803af7c58c4200aa04d8855e6819bd2cb7758825d7cd699b6a4ced9096b3cf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1793254965&rft_id=info:pmid/&rfr_iscdi=true