Loading…

Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming

Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing 2021, Vol.30, p.4622-4636
Main Authors: Zhang, Xue, Cheung, Gene, Zhao, Yao, Le Callet, Patrick, Lin, Chunyu, Tan, Jack Z. G.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3
cites cdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3
container_end_page 4636
container_issue
container_start_page 4622
container_title IEEE transactions on image processing
container_volume 30
creator Zhang, Xue
Cheung, Gene
Zhao, Yao
Le Callet, Patrick
Lin, Chunyu
Tan, Jack Z. G.
description Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.
doi_str_mv 10.1109/TIP.2021.3073283
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_journals_2522215259</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9416230</ieee_id><sourcerecordid>2518970711</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</originalsourceid><addsrcrecordid>eNpdkdGLEzEQh4Mo3ll9FwQJ-KIPW2eSTbJ5PA-9FiqeePoa0s2st0d3U5Ntwf_e1NY--JRM5psfGT7GXiLMEcG-v1vezgUInEswUjTyEbtEW2MFUIvH5Q7KVAZre8Ge5fwAgLVC_ZRdSGkBLNaX7OtN8tt7viKfxn78yT_4TIEvyAf-Oe5poHHit4lC3059HHkXE1-OEyVf6j1xqYH_6ANF_m1K5IcS8Zw96fwm04vTOWPfP328u15Uqy83y-urVdXKBqfKGw3gay2UXOtGaUOqKYUOFoLHEHyHGsHUuvMCu5aE1mC9Ua1SpOt1J2fs3TH33m_cNvWDT79d9L1bXK3c4Q2kVkJZucfCvj2y2xR_7ShPbuhzS5uNHynushMKG2vA4AF98x_6EHdpLJsUSgiBfzNnDI5Um2LOibrzDxDcQY0ratxBjTupKSOvT8G79UDhPPDPRQFeHYGeiM7tolMLCfIPd1GOMA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2522215259</pqid></control><display><type>article</type><title>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</title><source>IEEE Xplore (Online service)</source><creator>Zhang, Xue ; Cheung, Gene ; Zhao, Yao ; Le Callet, Patrick ; Lin, Chunyu ; Tan, Jack Z. G.</creator><creatorcontrib>Zhang, Xue ; Cheung, Gene ; Zhao, Yao ; Le Callet, Patrick ; Lin, Chunyu ; Tan, Jack Z. G.</creatorcontrib><description>Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2021.3073283</identifier><identifier>PMID: 33900914</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>360 video streaming ; Biological models (mathematics) ; Computer Science ; Data models ; directed graph learning ; Directed graphs ; Field of view ; Graph theory ; Head movement ; head movement prediction ; Helmet mounted displays ; High definition ; Human motion ; Image Processing ; Information sources ; Iterative methods ; Learning ; Markov chains ; Optimization ; Predictive models ; Servers ; Streaming media ; Video transmission</subject><ispartof>IEEE transactions on image processing, 2021, Vol.30, p.4622-4636</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</citedby><cites>FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</cites><orcidid>0000-0002-2143-7063 ; 0000-0003-2847-0349 ; 0000-0002-6579-7845 ; 0000-0002-8581-9554 ; 0000-0002-5571-4137</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9416230$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,885,4024,27923,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33900914$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-03652593$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Xue</creatorcontrib><creatorcontrib>Cheung, Gene</creatorcontrib><creatorcontrib>Zhao, Yao</creatorcontrib><creatorcontrib>Le Callet, Patrick</creatorcontrib><creatorcontrib>Lin, Chunyu</creatorcontrib><creatorcontrib>Tan, Jack Z. G.</creatorcontrib><title>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.</description><subject>360 video streaming</subject><subject>Biological models (mathematics)</subject><subject>Computer Science</subject><subject>Data models</subject><subject>directed graph learning</subject><subject>Directed graphs</subject><subject>Field of view</subject><subject>Graph theory</subject><subject>Head movement</subject><subject>head movement prediction</subject><subject>Helmet mounted displays</subject><subject>High definition</subject><subject>Human motion</subject><subject>Image Processing</subject><subject>Information sources</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Markov chains</subject><subject>Optimization</subject><subject>Predictive models</subject><subject>Servers</subject><subject>Streaming media</subject><subject>Video transmission</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpdkdGLEzEQh4Mo3ll9FwQJ-KIPW2eSTbJ5PA-9FiqeePoa0s2st0d3U5Ntwf_e1NY--JRM5psfGT7GXiLMEcG-v1vezgUInEswUjTyEbtEW2MFUIvH5Q7KVAZre8Ge5fwAgLVC_ZRdSGkBLNaX7OtN8tt7viKfxn78yT_4TIEvyAf-Oe5poHHit4lC3059HHkXE1-OEyVf6j1xqYH_6ANF_m1K5IcS8Zw96fwm04vTOWPfP328u15Uqy83y-urVdXKBqfKGw3gay2UXOtGaUOqKYUOFoLHEHyHGsHUuvMCu5aE1mC9Ua1SpOt1J2fs3TH33m_cNvWDT79d9L1bXK3c4Q2kVkJZucfCvj2y2xR_7ShPbuhzS5uNHynushMKG2vA4AF98x_6EHdpLJsUSgiBfzNnDI5Um2LOibrzDxDcQY0ratxBjTupKSOvT8G79UDhPPDPRQFeHYGeiM7tolMLCfIPd1GOMA</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Zhang, Xue</creator><creator>Cheung, Gene</creator><creator>Zhao, Yao</creator><creator>Le Callet, Patrick</creator><creator>Lin, Chunyu</creator><creator>Tan, Jack Z. G.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-2143-7063</orcidid><orcidid>https://orcid.org/0000-0003-2847-0349</orcidid><orcidid>https://orcid.org/0000-0002-6579-7845</orcidid><orcidid>https://orcid.org/0000-0002-8581-9554</orcidid><orcidid>https://orcid.org/0000-0002-5571-4137</orcidid></search><sort><creationdate>2021</creationdate><title>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</title><author>Zhang, Xue ; Cheung, Gene ; Zhao, Yao ; Le Callet, Patrick ; Lin, Chunyu ; Tan, Jack Z. G.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>360 video streaming</topic><topic>Biological models (mathematics)</topic><topic>Computer Science</topic><topic>Data models</topic><topic>directed graph learning</topic><topic>Directed graphs</topic><topic>Field of view</topic><topic>Graph theory</topic><topic>Head movement</topic><topic>head movement prediction</topic><topic>Helmet mounted displays</topic><topic>High definition</topic><topic>Human motion</topic><topic>Image Processing</topic><topic>Information sources</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Markov chains</topic><topic>Optimization</topic><topic>Predictive models</topic><topic>Servers</topic><topic>Streaming media</topic><topic>Video transmission</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Xue</creatorcontrib><creatorcontrib>Cheung, Gene</creatorcontrib><creatorcontrib>Zhao, Yao</creatorcontrib><creatorcontrib>Le Callet, Patrick</creatorcontrib><creatorcontrib>Lin, Chunyu</creatorcontrib><creatorcontrib>Tan, Jack Z. G.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Xue</au><au>Cheung, Gene</au><au>Zhao, Yao</au><au>Le Callet, Patrick</au><au>Lin, Chunyu</au><au>Tan, Jack Z. G.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2021</date><risdate>2021</risdate><volume>30</volume><spage>4622</spage><epage>4636</epage><pages>4622-4636</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>33900914</pmid><doi>10.1109/TIP.2021.3073283</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-2143-7063</orcidid><orcidid>https://orcid.org/0000-0003-2847-0349</orcidid><orcidid>https://orcid.org/0000-0002-6579-7845</orcidid><orcidid>https://orcid.org/0000-0002-8581-9554</orcidid><orcidid>https://orcid.org/0000-0002-5571-4137</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2021, Vol.30, p.4622-4636
issn 1057-7149
1941-0042
language eng
recordid cdi_proquest_journals_2522215259
source IEEE Xplore (Online service)
subjects 360 video streaming
Biological models (mathematics)
Computer Science
Data models
directed graph learning
Directed graphs
Field of view
Graph theory
Head movement
head movement prediction
Helmet mounted displays
High definition
Human motion
Image Processing
Information sources
Iterative methods
Learning
Markov chains
Optimization
Predictive models
Servers
Streaming media
Video transmission
title Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T16%3A50%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graph%20Learning%20Based%20Head%20Movement%20Prediction%20for%20Interactive%20360%20Video%20Streaming&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Zhang,%20Xue&rft.date=2021&rft.volume=30&rft.spage=4622&rft.epage=4636&rft.pages=4622-4636&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2021.3073283&rft_dat=%3Cproquest_pubme%3E2518970711%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2522215259&rft_id=info:pmid/33900914&rft_ieee_id=9416230&rfr_iscdi=true