Loading…

An efficient VLSI implementation of H.264/AVC intra-frame transcoder

The number of different display terminals increased steadily, from HD TV to mobile phone TV and transcoding has become an indispensable operation in video processing. In the most cases, transcoding has to be done in real time but H.264/AVC intra-frame decoding and encoding contain a set of computati...

Full description

Saved in:
Bibliographic Details
Main Authors: Guarisco, M., Dabellani, E., Marques, N., Rabah, H., Berviller, Y., Weber, S.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The number of different display terminals increased steadily, from HD TV to mobile phone TV and transcoding has become an indispensable operation in video processing. In the most cases, transcoding has to be done in real time but H.264/AVC intra-frame decoding and encoding contain a set of computation-intensive coding tools forming a loop in which the data are strongly dependant. Parallelization of each function isn't though effortless. In this paper, we present an optimized transcoding chain for AVC intra-frame stream. The transcoding chain is characterized by several operators based on loop iterations and working on 4×4 luma or 2×2 chroma blocs. This generates heavy latency. Ours approaches uses loop unrolling and data parallelization. A tradeoff is done between critical path and number of cycles in order to improve global latency. The architecture described in this paper includes a powerful CAVLC coder and decoder, an optimized transform-quantization and a frequency selection function for, respectively, requantization and quick decimation of the high frequency values in a quantified coefficient block. This whole system performs an efficient transcoding operation. Our design, thanks to a high parallelization, can decode then recode a video stream in a 1080p format at 30 frames per second (fps) in real time at the frequency of 47Mhz. This design has been implemented in a Virtex 5 FPGA. Each block is fully described giving the surface occupied and the timing diagram.
DOI:10.1109/ICECS.2011.6122199