Loading…

Compression of Multiscale Features of FPN with Channel-Wise Reduction for VCM

With the development of deep learning technology and the abundance of sensors, machine vision applications that utilize vast amounts of image/video data are rapidly increasing in the autonomous vehicle, video surveillance and smart city fields. However, achieving a more compact image/video represent...

Full description

Saved in:

Bibliographic Details
Published in:	Electronics (Basel) 2023-07, Vol.12 (13), p.2767
Main Authors:	Kim, Dong-Ha, Yoon, Yong-Uk, Han, Gyu-Woong, Oh, Byung Tae, Kim, Jae-Gon
Format:	Article
Language:	English
Subjects:	Coding Coding standards Coding theory Computer networks Data compression Feature maps Image coding Image compression Internet of Things Machine learning Machine vision Methods Modules Network latency Quality standards Reduction Video compression Video data Vision systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With the development of deep learning technology and the abundance of sensors, machine vision applications that utilize vast amounts of image/video data are rapidly increasing in the autonomous vehicle, video surveillance and smart city fields. However, achieving a more compact image/video representation and lower latency solutions is challenging for such machine-based applications. Therefore, it is essential to develop a more efficient video coding standard for machine vision applications. Currently, the Moving Picture Experts Group (MPEG) is developing a new standard called video coding for machines (VCM) with two tracks, each mainly dealing with compression of the input image/video (Track 2) and compression of the features extracted from it (Track 1). In this paper, an enhanced multiscale feature compression (E-MSFC) method is proposed to efficiently compress multiscale features generated by a feature pyramid network (FPN), which is the backbone network of machine vision networks specified in the VCM evaluation framework. The proposed E-MSFC reduces the feature channels to be included in a single feature map and compresses the feature map using versatile video coding (VVC), the latest video standard, rather than the single stream feature compression (SSFC) module in the existing MSFC. In addition, the performance of the E-MSFC is further enhanced by adding a bottom-up structure to the multiscale feature fusion (MSFF) module, which performs the channel-wise reduction in the E-MSFC. Experimental results reveal that the proposed E-MSFC significantly outperforms the VCM image anchor with a BD-rate gain of up to 85.94%, which includes an additional gain of 0.96% achieved by the MSFF with the bottom-up structure.
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics12132767