Loading…
Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming
Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a...
Saved in:
Published in: | IEEE transactions on image processing 2021, Vol.30, p.4622-4636 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3 |
---|---|
cites | cdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3 |
container_end_page | 4636 |
container_issue | |
container_start_page | 4622 |
container_title | IEEE transactions on image processing |
container_volume | 30 |
creator | Zhang, Xue Cheung, Gene Zhao, Yao Le Callet, Patrick Lin, Chunyu Tan, Jack Z. G. |
description | Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance. |
doi_str_mv | 10.1109/TIP.2021.3073283 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_journals_2522215259</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9416230</ieee_id><sourcerecordid>2518970711</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</originalsourceid><addsrcrecordid>eNpdkdGLEzEQh4Mo3ll9FwQJ-KIPW2eSTbJ5PA-9FiqeePoa0s2st0d3U5Ntwf_e1NY--JRM5psfGT7GXiLMEcG-v1vezgUInEswUjTyEbtEW2MFUIvH5Q7KVAZre8Ge5fwAgLVC_ZRdSGkBLNaX7OtN8tt7viKfxn78yT_4TIEvyAf-Oe5poHHit4lC3059HHkXE1-OEyVf6j1xqYH_6ANF_m1K5IcS8Zw96fwm04vTOWPfP328u15Uqy83y-urVdXKBqfKGw3gay2UXOtGaUOqKYUOFoLHEHyHGsHUuvMCu5aE1mC9Ua1SpOt1J2fs3TH33m_cNvWDT79d9L1bXK3c4Q2kVkJZucfCvj2y2xR_7ShPbuhzS5uNHynushMKG2vA4AF98x_6EHdpLJsUSgiBfzNnDI5Um2LOibrzDxDcQY0ratxBjTupKSOvT8G79UDhPPDPRQFeHYGeiM7tolMLCfIPd1GOMA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2522215259</pqid></control><display><type>article</type><title>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</title><source>IEEE Xplore (Online service)</source><creator>Zhang, Xue ; Cheung, Gene ; Zhao, Yao ; Le Callet, Patrick ; Lin, Chunyu ; Tan, Jack Z. G.</creator><creatorcontrib>Zhang, Xue ; Cheung, Gene ; Zhao, Yao ; Le Callet, Patrick ; Lin, Chunyu ; Tan, Jack Z. G.</creatorcontrib><description>Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2021.3073283</identifier><identifier>PMID: 33900914</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>360 video streaming ; Biological models (mathematics) ; Computer Science ; Data models ; directed graph learning ; Directed graphs ; Field of view ; Graph theory ; Head movement ; head movement prediction ; Helmet mounted displays ; High definition ; Human motion ; Image Processing ; Information sources ; Iterative methods ; Learning ; Markov chains ; Optimization ; Predictive models ; Servers ; Streaming media ; Video transmission</subject><ispartof>IEEE transactions on image processing, 2021, Vol.30, p.4622-4636</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</citedby><cites>FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</cites><orcidid>0000-0002-2143-7063 ; 0000-0003-2847-0349 ; 0000-0002-6579-7845 ; 0000-0002-8581-9554 ; 0000-0002-5571-4137</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9416230$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,885,4024,27923,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33900914$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-03652593$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Xue</creatorcontrib><creatorcontrib>Cheung, Gene</creatorcontrib><creatorcontrib>Zhao, Yao</creatorcontrib><creatorcontrib>Le Callet, Patrick</creatorcontrib><creatorcontrib>Lin, Chunyu</creatorcontrib><creatorcontrib>Tan, Jack Z. G.</creatorcontrib><title>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.</description><subject>360 video streaming</subject><subject>Biological models (mathematics)</subject><subject>Computer Science</subject><subject>Data models</subject><subject>directed graph learning</subject><subject>Directed graphs</subject><subject>Field of view</subject><subject>Graph theory</subject><subject>Head movement</subject><subject>head movement prediction</subject><subject>Helmet mounted displays</subject><subject>High definition</subject><subject>Human motion</subject><subject>Image Processing</subject><subject>Information sources</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Markov chains</subject><subject>Optimization</subject><subject>Predictive models</subject><subject>Servers</subject><subject>Streaming media</subject><subject>Video transmission</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpdkdGLEzEQh4Mo3ll9FwQJ-KIPW2eSTbJ5PA-9FiqeePoa0s2st0d3U5Ntwf_e1NY--JRM5psfGT7GXiLMEcG-v1vezgUInEswUjTyEbtEW2MFUIvH5Q7KVAZre8Ge5fwAgLVC_ZRdSGkBLNaX7OtN8tt7viKfxn78yT_4TIEvyAf-Oe5poHHit4lC3059HHkXE1-OEyVf6j1xqYH_6ANF_m1K5IcS8Zw96fwm04vTOWPfP328u15Uqy83y-urVdXKBqfKGw3gay2UXOtGaUOqKYUOFoLHEHyHGsHUuvMCu5aE1mC9Ua1SpOt1J2fs3TH33m_cNvWDT79d9L1bXK3c4Q2kVkJZucfCvj2y2xR_7ShPbuhzS5uNHynushMKG2vA4AF98x_6EHdpLJsUSgiBfzNnDI5Um2LOibrzDxDcQY0ratxBjTupKSOvT8G79UDhPPDPRQFeHYGeiM7tolMLCfIPd1GOMA</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Zhang, Xue</creator><creator>Cheung, Gene</creator><creator>Zhao, Yao</creator><creator>Le Callet, Patrick</creator><creator>Lin, Chunyu</creator><creator>Tan, Jack Z. G.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-2143-7063</orcidid><orcidid>https://orcid.org/0000-0003-2847-0349</orcidid><orcidid>https://orcid.org/0000-0002-6579-7845</orcidid><orcidid>https://orcid.org/0000-0002-8581-9554</orcidid><orcidid>https://orcid.org/0000-0002-5571-4137</orcidid></search><sort><creationdate>2021</creationdate><title>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</title><author>Zhang, Xue ; Cheung, Gene ; Zhao, Yao ; Le Callet, Patrick ; Lin, Chunyu ; Tan, Jack Z. G.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>360 video streaming</topic><topic>Biological models (mathematics)</topic><topic>Computer Science</topic><topic>Data models</topic><topic>directed graph learning</topic><topic>Directed graphs</topic><topic>Field of view</topic><topic>Graph theory</topic><topic>Head movement</topic><topic>head movement prediction</topic><topic>Helmet mounted displays</topic><topic>High definition</topic><topic>Human motion</topic><topic>Image Processing</topic><topic>Information sources</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Markov chains</topic><topic>Optimization</topic><topic>Predictive models</topic><topic>Servers</topic><topic>Streaming media</topic><topic>Video transmission</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Xue</creatorcontrib><creatorcontrib>Cheung, Gene</creatorcontrib><creatorcontrib>Zhao, Yao</creatorcontrib><creatorcontrib>Le Callet, Patrick</creatorcontrib><creatorcontrib>Lin, Chunyu</creatorcontrib><creatorcontrib>Tan, Jack Z. G.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Xue</au><au>Cheung, Gene</au><au>Zhao, Yao</au><au>Le Callet, Patrick</au><au>Lin, Chunyu</au><au>Tan, Jack Z. G.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2021</date><risdate>2021</risdate><volume>30</volume><spage>4622</spage><epage>4636</epage><pages>4622-4636</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>33900914</pmid><doi>10.1109/TIP.2021.3073283</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-2143-7063</orcidid><orcidid>https://orcid.org/0000-0003-2847-0349</orcidid><orcidid>https://orcid.org/0000-0002-6579-7845</orcidid><orcidid>https://orcid.org/0000-0002-8581-9554</orcidid><orcidid>https://orcid.org/0000-0002-5571-4137</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1057-7149 |
ispartof | IEEE transactions on image processing, 2021, Vol.30, p.4622-4636 |
issn | 1057-7149 1941-0042 |
language | eng |
recordid | cdi_proquest_journals_2522215259 |
source | IEEE Xplore (Online service) |
subjects | 360 video streaming Biological models (mathematics) Computer Science Data models directed graph learning Directed graphs Field of view Graph theory Head movement head movement prediction Helmet mounted displays High definition Human motion Image Processing Information sources Iterative methods Learning Markov chains Optimization Predictive models Servers Streaming media Video transmission |
title | Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T16%3A50%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graph%20Learning%20Based%20Head%20Movement%20Prediction%20for%20Interactive%20360%20Video%20Streaming&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Zhang,%20Xue&rft.date=2021&rft.volume=30&rft.spage=4622&rft.epage=4636&rft.pages=4622-4636&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2021.3073283&rft_dat=%3Cproquest_pubme%3E2518970711%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-a7600a46253b68567e586256d90da1ddaf1610746fa21fce26609a75c55e64bf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2522215259&rft_id=info:pmid/33900914&rft_ieee_id=9416230&rfr_iscdi=true |