Loading…

Spatio-Temporal Unity Networking for Video Anomaly Detection

Anomaly detection in video surveillance is challenging due to the variety of anomaly types and definitions, which limit the use of supervised techniques. As such, auto-encoder structures, a type of classical unsupervised method, have recently been utilized in this field. These structures consist of...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2019, Vol.7, p.172425-172432
Main Authors:	Li, Yuanyuan, Cai, Yiheng, Liu, Jiaqi, Lang, Shinan, Zhang, Xinfeng
Format:	Article
Language:	English
Subjects:	Anomalies Anomaly detection Clips Coders ConvLSTM Convolution Datasets Decoding Error detection Feature extraction Frames (data processing) Optical flow Optical losses Spatial data Training U-Net video
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3
cites	cdi_FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3
container_end_page	172432
container_issue
container_start_page	172425
container_title	IEEE access
container_volume	7
creator	Li, Yuanyuan Cai, Yiheng Liu, Jiaqi Lang, Shinan Zhang, Xinfeng
description	Anomaly detection in video surveillance is challenging due to the variety of anomaly types and definitions, which limit the use of supervised techniques. As such, auto-encoder structures, a type of classical unsupervised method, have recently been utilized in this field. These structures consist of an encoder followed by a decoder and are typically adopted to restructure a current input frame or predict a future frame. However, regardless of whether a 2D or 3D autoencoder structure is adopted, only single-scale information from the previous layer is typically used in the decoding process. This can result in a loss of detail that could potentially be used to predict or reconstruct video frames. As such, this study proposes a novel spatio-temporal U-Net for frame prediction using normal events and abnormality detection using prediction error. This framework combines the benefits of U-Nets in representing spatial information with the capabilities of ConvLSTM for modeling temporal motion data. In addition, we propose a new regular score function, consisting of a prediction error for not only the current frame but also future frames, to further improve the accuracy of anomaly detection. Extensive experiments on common anomaly datasets, including UCSD (98 video clips in total) and CUHK Avenue (30 video clips in total), validated the performance of the proposed technique and we achieved 96.5% AUC for the Ped2 dataset, which is much better than existing autoencoder-based and U-Net-based methods.
doi_str_mv	10.1109/ACCESS.2019.2954540
format	article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_19fc81132fe6461ebf5cfc3f45b70389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8907783</ieee_id><doaj_id>oai_doaj_org_article_19fc81132fe6461ebf5cfc3f45b70389</doaj_id><sourcerecordid>2455598648</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQXUTBUv0FvQQ8p-5mP7ILXkqsWih6aOt12SSzJTXNxk2K9N-7NaU4lxke8968eQhNCJ4SgtXjLMvmq9U0wURNE8UZZ_gKjRIiVEw5Fdf_5lt033U7HEoGiKcj9LRqTV-5eA371nlTR5um6o_RO_Q_zn9VzTayzkefVQkumjVub-pj9Aw9FIHU3KEba-oO7s99jDYv83X2Fi8_XhfZbBkXDMs-NkoKwlmRSspUznOmQFCSswTTlGMppVGUCg7GKGBQlkJKylNBFU5JQrmlY7QYdEtndrr11d74o3am0n-A81ttfF8VNWiibCEJoYkFwQSB3PLCFtQynqeYShW0Hgat1rvvA3S93rmDb4J9nTDOefDKZNiiw1bhXdd5sJerBOtT6npIXZ9S1-fUA2sysCoAuDBk-CO8Tn8BP2J7FA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455598648</pqid></control><display><type>article</type><title>Spatio-Temporal Unity Networking for Video Anomaly Detection</title><source>IEEE Xplore Open Access Journals</source><creator>Li, Yuanyuan ; Cai, Yiheng ; Liu, Jiaqi ; Lang, Shinan ; Zhang, Xinfeng</creator><creatorcontrib>Li, Yuanyuan ; Cai, Yiheng ; Liu, Jiaqi ; Lang, Shinan ; Zhang, Xinfeng</creatorcontrib><description>Anomaly detection in video surveillance is challenging due to the variety of anomaly types and definitions, which limit the use of supervised techniques. As such, auto-encoder structures, a type of classical unsupervised method, have recently been utilized in this field. These structures consist of an encoder followed by a decoder and are typically adopted to restructure a current input frame or predict a future frame. However, regardless of whether a 2D or 3D autoencoder structure is adopted, only single-scale information from the previous layer is typically used in the decoding process. This can result in a loss of detail that could potentially be used to predict or reconstruct video frames. As such, this study proposes a novel spatio-temporal U-Net for frame prediction using normal events and abnormality detection using prediction error. This framework combines the benefits of U-Nets in representing spatial information with the capabilities of ConvLSTM for modeling temporal motion data. In addition, we propose a new regular score function, consisting of a prediction error for not only the current frame but also future frames, to further improve the accuracy of anomaly detection. Extensive experiments on common anomaly datasets, including UCSD (98 video clips in total) and CUHK Avenue (30 video clips in total), validated the performance of the proposed technique and we achieved 96.5% AUC for the Ped2 dataset, which is much better than existing autoencoder-based and U-Net-based methods.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2019.2954540</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Anomalies ; Anomaly detection ; Clips ; Coders ; ConvLSTM ; Convolution ; Datasets ; Decoding ; Error detection ; Feature extraction ; Frames (data processing) ; Optical flow ; Optical losses ; Spatial data ; Training ; U-Net ; video</subject><ispartof>IEEE access, 2019, Vol.7, p.172425-172432</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3</citedby><cites>FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3</cites><orcidid>0000-0001-6304-7189 ; 0000-0002-5859-5858 ; 0000-0003-3909-7316 ; 0000-0002-4573-9760 ; 0000-0002-5325-5301</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8907783$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4023,27632,27922,27923,27924,54932</link.rule.ids></links><search><creatorcontrib>Li, Yuanyuan</creatorcontrib><creatorcontrib>Cai, Yiheng</creatorcontrib><creatorcontrib>Liu, Jiaqi</creatorcontrib><creatorcontrib>Lang, Shinan</creatorcontrib><creatorcontrib>Zhang, Xinfeng</creatorcontrib><title>Spatio-Temporal Unity Networking for Video Anomaly Detection</title><title>IEEE access</title><addtitle>Access</addtitle><description>Anomaly detection in video surveillance is challenging due to the variety of anomaly types and definitions, which limit the use of supervised techniques. As such, auto-encoder structures, a type of classical unsupervised method, have recently been utilized in this field. These structures consist of an encoder followed by a decoder and are typically adopted to restructure a current input frame or predict a future frame. However, regardless of whether a 2D or 3D autoencoder structure is adopted, only single-scale information from the previous layer is typically used in the decoding process. This can result in a loss of detail that could potentially be used to predict or reconstruct video frames. As such, this study proposes a novel spatio-temporal U-Net for frame prediction using normal events and abnormality detection using prediction error. This framework combines the benefits of U-Nets in representing spatial information with the capabilities of ConvLSTM for modeling temporal motion data. In addition, we propose a new regular score function, consisting of a prediction error for not only the current frame but also future frames, to further improve the accuracy of anomaly detection. Extensive experiments on common anomaly datasets, including UCSD (98 video clips in total) and CUHK Avenue (30 video clips in total), validated the performance of the proposed technique and we achieved 96.5% AUC for the Ped2 dataset, which is much better than existing autoencoder-based and U-Net-based methods.</description><subject>Anomalies</subject><subject>Anomaly detection</subject><subject>Clips</subject><subject>Coders</subject><subject>ConvLSTM</subject><subject>Convolution</subject><subject>Datasets</subject><subject>Decoding</subject><subject>Error detection</subject><subject>Feature extraction</subject><subject>Frames (data processing)</subject><subject>Optical flow</subject><subject>Optical losses</subject><subject>Spatial data</subject><subject>Training</subject><subject>U-Net</subject><subject>video</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQXUTBUv0FvQQ8p-5mP7ILXkqsWih6aOt12SSzJTXNxk2K9N-7NaU4lxke8968eQhNCJ4SgtXjLMvmq9U0wURNE8UZZ_gKjRIiVEw5Fdf_5lt033U7HEoGiKcj9LRqTV-5eA371nlTR5um6o_RO_Q_zn9VzTayzkefVQkumjVub-pj9Aw9FIHU3KEba-oO7s99jDYv83X2Fi8_XhfZbBkXDMs-NkoKwlmRSspUznOmQFCSswTTlGMppVGUCg7GKGBQlkJKylNBFU5JQrmlY7QYdEtndrr11d74o3am0n-A81ttfF8VNWiibCEJoYkFwQSB3PLCFtQynqeYShW0Hgat1rvvA3S93rmDb4J9nTDOefDKZNiiw1bhXdd5sJerBOtT6npIXZ9S1-fUA2sysCoAuDBk-CO8Tn8BP2J7FA</recordid><startdate>2019</startdate><enddate>2019</enddate><creator>Li, Yuanyuan</creator><creator>Cai, Yiheng</creator><creator>Liu, Jiaqi</creator><creator>Lang, Shinan</creator><creator>Zhang, Xinfeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6304-7189</orcidid><orcidid>https://orcid.org/0000-0002-5859-5858</orcidid><orcidid>https://orcid.org/0000-0003-3909-7316</orcidid><orcidid>https://orcid.org/0000-0002-4573-9760</orcidid><orcidid>https://orcid.org/0000-0002-5325-5301</orcidid></search><sort><creationdate>2019</creationdate><title>Spatio-Temporal Unity Networking for Video Anomaly Detection</title><author>Li, Yuanyuan ; Cai, Yiheng ; Liu, Jiaqi ; Lang, Shinan ; Zhang, Xinfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Anomalies</topic><topic>Anomaly detection</topic><topic>Clips</topic><topic>Coders</topic><topic>ConvLSTM</topic><topic>Convolution</topic><topic>Datasets</topic><topic>Decoding</topic><topic>Error detection</topic><topic>Feature extraction</topic><topic>Frames (data processing)</topic><topic>Optical flow</topic><topic>Optical losses</topic><topic>Spatial data</topic><topic>Training</topic><topic>U-Net</topic><topic>video</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Yuanyuan</creatorcontrib><creatorcontrib>Cai, Yiheng</creatorcontrib><creatorcontrib>Liu, Jiaqi</creatorcontrib><creatorcontrib>Lang, Shinan</creatorcontrib><creatorcontrib>Zhang, Xinfeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Yuanyuan</au><au>Cai, Yiheng</au><au>Liu, Jiaqi</au><au>Lang, Shinan</au><au>Zhang, Xinfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spatio-Temporal Unity Networking for Video Anomaly Detection</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2019</date><risdate>2019</risdate><volume>7</volume><spage>172425</spage><epage>172432</epage><pages>172425-172432</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Anomaly detection in video surveillance is challenging due to the variety of anomaly types and definitions, which limit the use of supervised techniques. As such, auto-encoder structures, a type of classical unsupervised method, have recently been utilized in this field. These structures consist of an encoder followed by a decoder and are typically adopted to restructure a current input frame or predict a future frame. However, regardless of whether a 2D or 3D autoencoder structure is adopted, only single-scale information from the previous layer is typically used in the decoding process. This can result in a loss of detail that could potentially be used to predict or reconstruct video frames. As such, this study proposes a novel spatio-temporal U-Net for frame prediction using normal events and abnormality detection using prediction error. This framework combines the benefits of U-Nets in representing spatial information with the capabilities of ConvLSTM for modeling temporal motion data. In addition, we propose a new regular score function, consisting of a prediction error for not only the current frame but also future frames, to further improve the accuracy of anomaly detection. Extensive experiments on common anomaly datasets, including UCSD (98 video clips in total) and CUHK Avenue (30 video clips in total), validated the performance of the proposed technique and we achieved 96.5% AUC for the Ped2 dataset, which is much better than existing autoencoder-based and U-Net-based methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2019.2954540</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0001-6304-7189</orcidid><orcidid>https://orcid.org/0000-0002-5859-5858</orcidid><orcidid>https://orcid.org/0000-0003-3909-7316</orcidid><orcidid>https://orcid.org/0000-0002-4573-9760</orcidid><orcidid>https://orcid.org/0000-0002-5325-5301</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2019, Vol.7, p.172425-172432
issn	2169-3536 2169-3536
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_19fc81132fe6461ebf5cfc3f45b70389
source	IEEE Xplore Open Access Journals
subjects	Anomalies Anomaly detection Clips Coders ConvLSTM Convolution Datasets Decoding Error detection Feature extraction Frames (data processing) Optical flow Optical losses Spatial data Training U-Net video
title	Spatio-Temporal Unity Networking for Video Anomaly Detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T08%3A22%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spatio-Temporal%20Unity%20Networking%20for%20Video%20Anomaly%20Detection&rft.jtitle=IEEE%20access&rft.au=Li,%20Yuanyuan&rft.date=2019&rft.volume=7&rft.spage=172425&rft.epage=172432&rft.pages=172425-172432&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2019.2954540&rft_dat=%3Cproquest_doaj_%3E2455598648%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-a986154c78349b5b49e631b4203750888a93365eaa9e4edd688357639071235f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2455598648&rft_id=info:pmid/&rft_ieee_id=8907783&rfr_iscdi=true