Loading…

Latent semantic learning with time-series cross correlation analysis for video scene detection and classification

This paper presents a novel, latent semantic learning method based on the proposed time-series cross correlation analysis for extracting a discriminative dynamic scene model to address the recognition problems of video event recognition and 3D human body gesture. Typical dynamic texture analysis pos...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2016-10, Vol.75 (20), p.12919-12940
Main Authors: Cheng, Shyi-Chyi, Su, Jui-Yuan, Hsiao, Kuei-Fang, Rashvand, Habib F.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123
cites cdi_FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123
container_end_page 12940
container_issue 20
container_start_page 12919
container_title Multimedia tools and applications
container_volume 75
creator Cheng, Shyi-Chyi
Su, Jui-Yuan
Hsiao, Kuei-Fang
Rashvand, Habib F.
description This paper presents a novel, latent semantic learning method based on the proposed time-series cross correlation analysis for extracting a discriminative dynamic scene model to address the recognition problems of video event recognition and 3D human body gesture. Typical dynamic texture analysis poses the problems of modeling, learning, recognizing and synthesizing the images of dynamic scenes based on the autoregressive moving average (ARMA) model. Instead of applying the ARMA approach to capture the temporal structure of video sequences, this algorithm uses the learned dynamic scene model to semantically transform video sequences into multiple scenes with a lower computational effort. Therefore, to generate a discriminative dynamic scene model with space-time information preserved is crucial for the success of the proposed latent semantic learning. To achieve the goal, the k-medoids clustering with appearance distance metrics first used to partition all frames of training video sequences, regardless of their scene types, to provide an initial key-frame codebook. To discover the temporal structure of the dynamic scene model, we develop a time-series cross correlation analysis (TSCCA) to the latent semantic learning, with an alternating dynamic programing (ADP) to embed the time relationship between the training images into the dynamic scene model. We also tackle the problem of dynamic programming, which is supposed to produce large temporal misalignment for periodic activities. Moreover, the discriminative power of the model is estimated by a deterministic projection-based learning algorithm. Finally, based on the learned dynamic scene model, this paper uses a support vector machine (SVM) with a two-channel string kernel for video scene classification. Two test datasets, one for video event classification and the other for 3D human body gesture recognition, are used to verify the effectiveness of the proposed approach. Experimental results demonstrate that the proposed algorithm obtains good performance in terms of classification accuracy.
doi_str_mv 10.1007/s11042-015-2548-y
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1845797364</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4313398921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123</originalsourceid><addsrcrecordid>eNp1kT1LBDEQhhdR8PMH2AVsbKKZ3c1mU4r4BQc2WodsMjlz7GU1ySn3783dWYhgMzPF877MzFtV58CugDFxnQBYW1MGnNa87el6rzoCLhoqRA37ZW56RgVncFgdp7RgDDpet0fVx0xnDJkkXOqQvSEj6hh8mJMvn99I9kukCaPHREycUqlTjDjq7KdAdNDjOvlE3BTJp7c4kWQwILGY0fwglphRp-SdN1vVaXXg9Jjw7KefVK_3dy-3j3T2_PB0ezOjpmllphYd9ihMgyAtG6QQwK2zndW1lU4A8N7xwQ1mkNIZBgDCtQ46ZK4bAOrmpLrc-b7H6WOFKaulL9uNow44rZKCvuVCiqZrC3rxB11Mq1iO21CdkA2IVhYKdtT2ERGdeo9-qeNaAVObENQuBFVCUJsQ1Lpo6p0mFTbMMf5y_lf0DWqgjSs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1867931749</pqid></control><display><type>article</type><title>Latent semantic learning with time-series cross correlation analysis for video scene detection and classification</title><source>ABI/INFORM global</source><source>Springer Link</source><creator>Cheng, Shyi-Chyi ; Su, Jui-Yuan ; Hsiao, Kuei-Fang ; Rashvand, Habib F.</creator><creatorcontrib>Cheng, Shyi-Chyi ; Su, Jui-Yuan ; Hsiao, Kuei-Fang ; Rashvand, Habib F.</creatorcontrib><description>This paper presents a novel, latent semantic learning method based on the proposed time-series cross correlation analysis for extracting a discriminative dynamic scene model to address the recognition problems of video event recognition and 3D human body gesture. Typical dynamic texture analysis poses the problems of modeling, learning, recognizing and synthesizing the images of dynamic scenes based on the autoregressive moving average (ARMA) model. Instead of applying the ARMA approach to capture the temporal structure of video sequences, this algorithm uses the learned dynamic scene model to semantically transform video sequences into multiple scenes with a lower computational effort. Therefore, to generate a discriminative dynamic scene model with space-time information preserved is crucial for the success of the proposed latent semantic learning. To achieve the goal, the k-medoids clustering with appearance distance metrics first used to partition all frames of training video sequences, regardless of their scene types, to provide an initial key-frame codebook. To discover the temporal structure of the dynamic scene model, we develop a time-series cross correlation analysis (TSCCA) to the latent semantic learning, with an alternating dynamic programing (ADP) to embed the time relationship between the training images into the dynamic scene model. We also tackle the problem of dynamic programming, which is supposed to produce large temporal misalignment for periodic activities. Moreover, the discriminative power of the model is estimated by a deterministic projection-based learning algorithm. Finally, based on the learned dynamic scene model, this paper uses a support vector machine (SVM) with a two-channel string kernel for video scene classification. Two test datasets, one for video event classification and the other for 3D human body gesture recognition, are used to verify the effectiveness of the proposed approach. Experimental results demonstrate that the proposed algorithm obtains good performance in terms of classification accuracy.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-015-2548-y</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Classification ; Computer Communication Networks ; Computer Science ; Computer vision ; Correlation analysis ; Data Structures and Information Theory ; Dynamic programming ; Dynamic structural analysis ; Dynamics ; Learning ; Machine learning ; Multimedia Information Systems ; Recognition ; Semantics ; Spacetime ; Special Purpose and Application-Based Systems ; Surveillance ; Texture</subject><ispartof>Multimedia tools and applications, 2016-10, Vol.75 (20), p.12919-12940</ispartof><rights>Springer Science+Business Media New York 2015</rights><rights>Multimedia Tools and Applications is a copyright of Springer, 2016.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123</citedby><cites>FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1867931749/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1867931749?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,11667,27901,27902,36037,36038,44339,74638</link.rule.ids></links><search><creatorcontrib>Cheng, Shyi-Chyi</creatorcontrib><creatorcontrib>Su, Jui-Yuan</creatorcontrib><creatorcontrib>Hsiao, Kuei-Fang</creatorcontrib><creatorcontrib>Rashvand, Habib F.</creatorcontrib><title>Latent semantic learning with time-series cross correlation analysis for video scene detection and classification</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>This paper presents a novel, latent semantic learning method based on the proposed time-series cross correlation analysis for extracting a discriminative dynamic scene model to address the recognition problems of video event recognition and 3D human body gesture. Typical dynamic texture analysis poses the problems of modeling, learning, recognizing and synthesizing the images of dynamic scenes based on the autoregressive moving average (ARMA) model. Instead of applying the ARMA approach to capture the temporal structure of video sequences, this algorithm uses the learned dynamic scene model to semantically transform video sequences into multiple scenes with a lower computational effort. Therefore, to generate a discriminative dynamic scene model with space-time information preserved is crucial for the success of the proposed latent semantic learning. To achieve the goal, the k-medoids clustering with appearance distance metrics first used to partition all frames of training video sequences, regardless of their scene types, to provide an initial key-frame codebook. To discover the temporal structure of the dynamic scene model, we develop a time-series cross correlation analysis (TSCCA) to the latent semantic learning, with an alternating dynamic programing (ADP) to embed the time relationship between the training images into the dynamic scene model. We also tackle the problem of dynamic programming, which is supposed to produce large temporal misalignment for periodic activities. Moreover, the discriminative power of the model is estimated by a deterministic projection-based learning algorithm. Finally, based on the learned dynamic scene model, this paper uses a support vector machine (SVM) with a two-channel string kernel for video scene classification. Two test datasets, one for video event classification and the other for 3D human body gesture recognition, are used to verify the effectiveness of the proposed approach. Experimental results demonstrate that the proposed algorithm obtains good performance in terms of classification accuracy.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Classification</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Correlation analysis</subject><subject>Data Structures and Information Theory</subject><subject>Dynamic programming</subject><subject>Dynamic structural analysis</subject><subject>Dynamics</subject><subject>Learning</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Recognition</subject><subject>Semantics</subject><subject>Spacetime</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Surveillance</subject><subject>Texture</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kT1LBDEQhhdR8PMH2AVsbKKZ3c1mU4r4BQc2WodsMjlz7GU1ySn3783dWYhgMzPF877MzFtV58CugDFxnQBYW1MGnNa87el6rzoCLhoqRA37ZW56RgVncFgdp7RgDDpet0fVx0xnDJkkXOqQvSEj6hh8mJMvn99I9kukCaPHREycUqlTjDjq7KdAdNDjOvlE3BTJp7c4kWQwILGY0fwglphRp-SdN1vVaXXg9Jjw7KefVK_3dy-3j3T2_PB0ezOjpmllphYd9ihMgyAtG6QQwK2zndW1lU4A8N7xwQ1mkNIZBgDCtQ46ZK4bAOrmpLrc-b7H6WOFKaulL9uNow44rZKCvuVCiqZrC3rxB11Mq1iO21CdkA2IVhYKdtT2ERGdeo9-qeNaAVObENQuBFVCUJsQ1Lpo6p0mFTbMMf5y_lf0DWqgjSs</recordid><startdate>20161001</startdate><enddate>20161001</enddate><creator>Cheng, Shyi-Chyi</creator><creator>Su, Jui-Yuan</creator><creator>Hsiao, Kuei-Fang</creator><creator>Rashvand, Habib F.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20161001</creationdate><title>Latent semantic learning with time-series cross correlation analysis for video scene detection and classification</title><author>Cheng, Shyi-Chyi ; Su, Jui-Yuan ; Hsiao, Kuei-Fang ; Rashvand, Habib F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Classification</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Correlation analysis</topic><topic>Data Structures and Information Theory</topic><topic>Dynamic programming</topic><topic>Dynamic structural analysis</topic><topic>Dynamics</topic><topic>Learning</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Recognition</topic><topic>Semantics</topic><topic>Spacetime</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Surveillance</topic><topic>Texture</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Shyi-Chyi</creatorcontrib><creatorcontrib>Su, Jui-Yuan</creatorcontrib><creatorcontrib>Hsiao, Kuei-Fang</creatorcontrib><creatorcontrib>Rashvand, Habib F.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM global</collection><collection>Computing Database</collection><collection>ProQuest Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cheng, Shyi-Chyi</au><au>Su, Jui-Yuan</au><au>Hsiao, Kuei-Fang</au><au>Rashvand, Habib F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Latent semantic learning with time-series cross correlation analysis for video scene detection and classification</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2016-10-01</date><risdate>2016</risdate><volume>75</volume><issue>20</issue><spage>12919</spage><epage>12940</epage><pages>12919-12940</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>This paper presents a novel, latent semantic learning method based on the proposed time-series cross correlation analysis for extracting a discriminative dynamic scene model to address the recognition problems of video event recognition and 3D human body gesture. Typical dynamic texture analysis poses the problems of modeling, learning, recognizing and synthesizing the images of dynamic scenes based on the autoregressive moving average (ARMA) model. Instead of applying the ARMA approach to capture the temporal structure of video sequences, this algorithm uses the learned dynamic scene model to semantically transform video sequences into multiple scenes with a lower computational effort. Therefore, to generate a discriminative dynamic scene model with space-time information preserved is crucial for the success of the proposed latent semantic learning. To achieve the goal, the k-medoids clustering with appearance distance metrics first used to partition all frames of training video sequences, regardless of their scene types, to provide an initial key-frame codebook. To discover the temporal structure of the dynamic scene model, we develop a time-series cross correlation analysis (TSCCA) to the latent semantic learning, with an alternating dynamic programing (ADP) to embed the time relationship between the training images into the dynamic scene model. We also tackle the problem of dynamic programming, which is supposed to produce large temporal misalignment for periodic activities. Moreover, the discriminative power of the model is estimated by a deterministic projection-based learning algorithm. Finally, based on the learned dynamic scene model, this paper uses a support vector machine (SVM) with a two-channel string kernel for video scene classification. Two test datasets, one for video event classification and the other for 3D human body gesture recognition, are used to verify the effectiveness of the proposed approach. Experimental results demonstrate that the proposed algorithm obtains good performance in terms of classification accuracy.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-015-2548-y</doi><tpages>22</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1380-7501
ispartof Multimedia tools and applications, 2016-10, Vol.75 (20), p.12919-12940
issn 1380-7501
1573-7721
language eng
recordid cdi_proquest_miscellaneous_1845797364
source ABI/INFORM global; Springer Link
subjects Accuracy
Algorithms
Classification
Computer Communication Networks
Computer Science
Computer vision
Correlation analysis
Data Structures and Information Theory
Dynamic programming
Dynamic structural analysis
Dynamics
Learning
Machine learning
Multimedia Information Systems
Recognition
Semantics
Spacetime
Special Purpose and Application-Based Systems
Surveillance
Texture
title Latent semantic learning with time-series cross correlation analysis for video scene detection and classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T21%3A37%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Latent%20semantic%20learning%20with%20time-series%20cross%20correlation%20analysis%20for%20video%20scene%20detection%20and%20classification&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Cheng,%20Shyi-Chyi&rft.date=2016-10-01&rft.volume=75&rft.issue=20&rft.spage=12919&rft.epage=12940&rft.pages=12919-12940&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-015-2548-y&rft_dat=%3Cproquest_cross%3E4313398921%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c349t-defe8e7c3e19d0b97715dfd6da2d9f71158f5bfbcb99fc01117f4f16e0f6b1123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1867931749&rft_id=info:pmid/&rfr_iscdi=true