Loading…
Multiple Transform Selection Concept Modeling and Implementation Using Dynamic and Parameterized Dataflow Graphs
The new video coding standard, Versatile Video Coding (VVC), released by the end of 2020 has increased the coding complexity both at encoder and decoder sides. This complexity increase is due to several coding tools proposed to enhance the coding efficiency. One of these tools is the Multiple Transf...
Saved in:
Published in: | Journal of signal processing systems 2022-07, Vol.94 (7), p.709-720 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The new video coding standard, Versatile Video Coding (VVC), released by the end of 2020 has increased the coding complexity both at encoder and decoder sides. This complexity increase is due to several coding tools proposed to enhance the coding efficiency. One of these tools is the Multiple Transform Selection (MTS) concept, a new approach for the transform unit. This paper aims at providing a new optimization of the MTS based on dataflow modeling. The proposed approach takes benefit of the different parallelism levels of the MTS in order to create an optimized multicore implementation. Also, this paper study the impact of the dataflow model granularity and the dynamic reconfiguration on the implementation efficiency on x86 multicore architectures. The PREESM tool is used in this study to develop the proposed dataflow models and for the granularity analysis. The dynamic reconfiguration study is here performed using the SPIDER runtime optimized for the multicore execution of applications modeled using Parameterized and Interfaced Synchronous Dataflow (PiSDF) dataflow graphs. Two architectures were used in this work: an x86 architecture with 4 cores and an x86 architecture with 24 cores. The results show that the SPIDER overhead time is almost negligible (0.05%) compared to the execution time of the application. Furthermore, a speed-up of 3.9 and up to 22 for all block sizes was achieved using a 4-core and 24-core machine, respectively. |
---|---|
ISSN: | 1939-8018 1939-8115 |
DOI: | 10.1007/s11265-021-01725-4 |