Loading…

A Lightweight Model for Deep Frame Prediction in Video Coding

Recent studies have demonstrated the efficacy of deep neural network (DNN)-based inter frame prediction for video coding. The network commonly used in these studies is built upon a U-Net-like architecture and produces content-adaptive 1-D separable filters with a large number of taps for frame predi...

Full description

Saved in:
Bibliographic Details
Main Authors: Choi, Hyomin, Bajic, Ivan V.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 1126
container_issue
container_start_page 1122
container_title
container_volume
creator Choi, Hyomin
Bajic, Ivan V.
description Recent studies have demonstrated the efficacy of deep neural network (DNN)-based inter frame prediction for video coding. The network commonly used in these studies is built upon a U-Net-like architecture and produces content-adaptive 1-D separable filters with a large number of taps for frame prediction. This leads to a model with a large number of parameters. In this paper, we propose a lighter version of the network with significantly fewer parameters, by making use of dilated convolutional layers and making the U-Net shallower. In addition, we introduce a DCT-based ℓ 1 -loss term that encourages compression, and explore several ways of integrating our lightweight model into HEVC. Both frame prediction accuracy and coding efficiency are compared against previous works. The experiments show that the proposed model achieves up to 6.4% average bit reduction in terms of BD-Bitrate against HEVC, which is significantly better than existing methods in the literature.
doi_str_mv 10.1109/IEEECONF51394.2020.9443427
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9443427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9443427</ieee_id><sourcerecordid>9443427</sourcerecordid><originalsourceid>FETCH-LOGICAL-i133t-f7dcd2c3959771242b0fcecba7bbc08a6245154b072ee11d0002a058db6eb3ab3</originalsourceid><addsrcrecordid>eNotjz1PwzAURQ0SEqX0F7BY7CnPfk4cDwxVSKBSoAzAWvnjpRi1SZVEQvx7guhy73J1dC5jtwKWQoC5W5dlWWxeqlSgUUsJEpZGKVRSn7GF0TlozAUKqdJzNpOpzhKJgJfsahi-YFrLXM7Y_YrXcfc5ftNf8ucu0J43Xc8fiI686u2B-GtPIfoxdi2PLf-IgTpedCG2u2t20dj9QItTz9l7Vb4VT0m9eVwXqzqJAnFMGh18kB5NarSehKSDxpN3VjvnIbfZ5ChS5UBLIiECTHoW0jy4jBxah3N288-NRLQ99vFg-5_t6S3-AlmWSZk</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Lightweight Model for Deep Frame Prediction in Video Coding</title><source>IEEE Xplore All Conference Series</source><creator>Choi, Hyomin ; Bajic, Ivan V.</creator><creatorcontrib>Choi, Hyomin ; Bajic, Ivan V.</creatorcontrib><description>Recent studies have demonstrated the efficacy of deep neural network (DNN)-based inter frame prediction for video coding. The network commonly used in these studies is built upon a U-Net-like architecture and produces content-adaptive 1-D separable filters with a large number of taps for frame prediction. This leads to a model with a large number of parameters. In this paper, we propose a lighter version of the network with significantly fewer parameters, by making use of dilated convolutional layers and making the U-Net shallower. In addition, we introduce a DCT-based ℓ 1 -loss term that encourages compression, and explore several ways of integrating our lightweight model into HEVC. Both frame prediction accuracy and coding efficiency are compared against previous works. The experiments show that the proposed model achieves up to 6.4% average bit reduction in terms of BD-Bitrate against HEVC, which is significantly better than existing methods in the literature.</description><identifier>EISSN: 2576-2303</identifier><identifier>EISBN: 9780738131245</identifier><identifier>EISBN: 0738131245</identifier><identifier>EISBN: 9780738131269</identifier><identifier>EISBN: 0738131261</identifier><identifier>DOI: 10.1109/IEEECONF51394.2020.9443427</identifier><language>eng</language><publisher>IEEE</publisher><subject>Convolution ; Convolutional codes ; deep frame prediction ; deep neural network ; Neural networks ; Pipelines ; Rate-distortion ; Technological innovation ; Video coding</subject><ispartof>2020 54th Asilomar Conference on Signals, Systems, and Computers, 2020, p.1122-1126</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9443427$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9443427$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Choi, Hyomin</creatorcontrib><creatorcontrib>Bajic, Ivan V.</creatorcontrib><title>A Lightweight Model for Deep Frame Prediction in Video Coding</title><title>2020 54th Asilomar Conference on Signals, Systems, and Computers</title><addtitle>IEEECONF</addtitle><description>Recent studies have demonstrated the efficacy of deep neural network (DNN)-based inter frame prediction for video coding. The network commonly used in these studies is built upon a U-Net-like architecture and produces content-adaptive 1-D separable filters with a large number of taps for frame prediction. This leads to a model with a large number of parameters. In this paper, we propose a lighter version of the network with significantly fewer parameters, by making use of dilated convolutional layers and making the U-Net shallower. In addition, we introduce a DCT-based ℓ 1 -loss term that encourages compression, and explore several ways of integrating our lightweight model into HEVC. Both frame prediction accuracy and coding efficiency are compared against previous works. The experiments show that the proposed model achieves up to 6.4% average bit reduction in terms of BD-Bitrate against HEVC, which is significantly better than existing methods in the literature.</description><subject>Convolution</subject><subject>Convolutional codes</subject><subject>deep frame prediction</subject><subject>deep neural network</subject><subject>Neural networks</subject><subject>Pipelines</subject><subject>Rate-distortion</subject><subject>Technological innovation</subject><subject>Video coding</subject><issn>2576-2303</issn><isbn>9780738131245</isbn><isbn>0738131245</isbn><isbn>9780738131269</isbn><isbn>0738131261</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjz1PwzAURQ0SEqX0F7BY7CnPfk4cDwxVSKBSoAzAWvnjpRi1SZVEQvx7guhy73J1dC5jtwKWQoC5W5dlWWxeqlSgUUsJEpZGKVRSn7GF0TlozAUKqdJzNpOpzhKJgJfsahi-YFrLXM7Y_YrXcfc5ftNf8ucu0J43Xc8fiI686u2B-GtPIfoxdi2PLf-IgTpedCG2u2t20dj9QItTz9l7Vb4VT0m9eVwXqzqJAnFMGh18kB5NarSehKSDxpN3VjvnIbfZ5ChS5UBLIiECTHoW0jy4jBxah3N288-NRLQ99vFg-5_t6S3-AlmWSZk</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>Choi, Hyomin</creator><creator>Bajic, Ivan V.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20201101</creationdate><title>A Lightweight Model for Deep Frame Prediction in Video Coding</title><author>Choi, Hyomin ; Bajic, Ivan V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i133t-f7dcd2c3959771242b0fcecba7bbc08a6245154b072ee11d0002a058db6eb3ab3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Convolution</topic><topic>Convolutional codes</topic><topic>deep frame prediction</topic><topic>deep neural network</topic><topic>Neural networks</topic><topic>Pipelines</topic><topic>Rate-distortion</topic><topic>Technological innovation</topic><topic>Video coding</topic><toplevel>online_resources</toplevel><creatorcontrib>Choi, Hyomin</creatorcontrib><creatorcontrib>Bajic, Ivan V.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Choi, Hyomin</au><au>Bajic, Ivan V.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Lightweight Model for Deep Frame Prediction in Video Coding</atitle><btitle>2020 54th Asilomar Conference on Signals, Systems, and Computers</btitle><stitle>IEEECONF</stitle><date>2020-11-01</date><risdate>2020</risdate><spage>1122</spage><epage>1126</epage><pages>1122-1126</pages><eissn>2576-2303</eissn><eisbn>9780738131245</eisbn><eisbn>0738131245</eisbn><eisbn>9780738131269</eisbn><eisbn>0738131261</eisbn><abstract>Recent studies have demonstrated the efficacy of deep neural network (DNN)-based inter frame prediction for video coding. The network commonly used in these studies is built upon a U-Net-like architecture and produces content-adaptive 1-D separable filters with a large number of taps for frame prediction. This leads to a model with a large number of parameters. In this paper, we propose a lighter version of the network with significantly fewer parameters, by making use of dilated convolutional layers and making the U-Net shallower. In addition, we introduce a DCT-based ℓ 1 -loss term that encourages compression, and explore several ways of integrating our lightweight model into HEVC. Both frame prediction accuracy and coding efficiency are compared against previous works. The experiments show that the proposed model achieves up to 6.4% average bit reduction in terms of BD-Bitrate against HEVC, which is significantly better than existing methods in the literature.</abstract><pub>IEEE</pub><doi>10.1109/IEEECONF51394.2020.9443427</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2576-2303
ispartof 2020 54th Asilomar Conference on Signals, Systems, and Computers, 2020, p.1122-1126
issn 2576-2303
language eng
recordid cdi_ieee_primary_9443427
source IEEE Xplore All Conference Series
subjects Convolution
Convolutional codes
deep frame prediction
deep neural network
Neural networks
Pipelines
Rate-distortion
Technological innovation
Video coding
title A Lightweight Model for Deep Frame Prediction in Video Coding
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T02%3A55%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Lightweight%20Model%20for%20Deep%20Frame%20Prediction%20in%20Video%20Coding&rft.btitle=2020%2054th%20Asilomar%20Conference%20on%20Signals,%20Systems,%20and%20Computers&rft.au=Choi,%20Hyomin&rft.date=2020-11-01&rft.spage=1122&rft.epage=1126&rft.pages=1122-1126&rft.eissn=2576-2303&rft_id=info:doi/10.1109/IEEECONF51394.2020.9443427&rft.eisbn=9780738131245&rft.eisbn_list=0738131245&rft.eisbn_list=9780738131269&rft.eisbn_list=0738131261&rft_dat=%3Cieee_CHZPO%3E9443427%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i133t-f7dcd2c3959771242b0fcecba7bbc08a6245154b072ee11d0002a058db6eb3ab3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9443427&rfr_iscdi=true