Loading…

Algorithm optimization and hardware implementation for Merge mode in HEVC

Merge mode is a new tool for improving inter-frame coding efficiency in high-efficiency video coding. This tool can save the bitrate for the motion vector by sharing this vector with neighboring blocks. Merge is a process that selects a candidate motion vector by calculating the cost of rate-distort...

Full description

Saved in:
Bibliographic Details
Published in:Journal of real-time image processing 2020-06, Vol.17 (3), p.623-630
Main Authors: Shi, Long-zhao, Gao, Xiaohong, Yang, Xiuzhi, Chen, Zhifeng, Zheng, Mingkui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Merge mode is a new tool for improving inter-frame coding efficiency in high-efficiency video coding. This tool can save the bitrate for the motion vector by sharing this vector with neighboring blocks. Merge is a process that selects a candidate motion vector by calculating the cost of rate-distortion. However, this process requires a large number of complex computations and memory access, thereby resulting in the low efficiency of hardware implementation. This paper proposes a new Merge candidate decision scheme that determines the most favorable Merge candidate from a full list of candidates by comparing the sum of absolute transformed difference with the weighted header bit instead of performing a complex calculation for sum of squared difference and entropy coding process in HM16.7. The simulation results show that the performance of the proposed algorithm is close to that of HM16.7 and increases the BD-rate only by 0.22–1.21%. The multilevel pipelines architecture is also exploited in the hardware design. The weighted header bit operation is performed by using the look-up table, which reduces both the complexity and encoding clock cycle. The designed system is implemented with a register transfer level code. The synthesis results from the Design Compiler show that compared with other architecture, the proposed architecture offers great advantages in resource utilization and can process 1920 × 1080 at 353 frame/s for P-slices with a clock frequency of 1057 MHz and logic gate count of 285.2 K.
ISSN:1861-8200
1861-8219
DOI:10.1007/s11554-018-0818-4