Loading…

An Efficient Multi-Scale Feature Compression with QP-Adaptive Feature Channel Truncation for Video Coding for Machines

Machine vision-based intelligent applications that analyze video data collected by machines are rapidly increasing. Therefore, it is essential to efficiently compress a large volume of video data for machine consumption. Accordingly, the Moving Picture Experts Group (MPEG) has been developing a new...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2023-01, Vol.11, p.1-1
Main Authors: Yoon, Yong-Uk, Kim, Dongha, Lee, Jooyoung, Oh, Byung Tae, Kim, Jae-Gon
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine vision-based intelligent applications that analyze video data collected by machines are rapidly increasing. Therefore, it is essential to efficiently compress a large volume of video data for machine consumption. Accordingly, the Moving Picture Experts Group (MPEG) has been developing a new video coding standard called Video Coding for Machines (VCM), aimed at video consumed by machines rather than humans. Recently, studies have demonstrated that multi-scale feature compression (MSFC)-based feature compression methods significantly improve the performance of MPEG-VCM. This paper proposes an efficient MSFC (eMSFC) method with quantization parameter (QP)-adaptive feature channel truncation. The proposed eMSFC incorporates an MSFC network with a selective learning strategy (SLS) and Versatile Video Coding (VVC)-based compression. The SLS extracts a single-scale feature from the input image, arranged in order of channel-wise importance. The size of the single-scale feature is adaptively adjusted by truncating the feature channels according to the QP. The truncated feature is efficiently compressed using VVC. Compared to the VCM feature anchor, the experimental results reveal that the proposed method provides a 98.72%, 98.34%, and 98.04% Bjontegaard delta rate gain for machine vision tasks of instance segmentation, object detection, and object tracking, respectively. The proposed method performed best among the "Call for Evidence" response technologies in MPEG-VCM.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3307404