Loading…

A parallel memory architecture for video coding

To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit d...

Full description

Saved in:
Bibliographic Details
Published in:Journal of Zhejiang University. A. Science 2008-12, Vol.9 (12), p.1644-1655
Main Authors: Peng, Jian-ying, Yan, Xiao-lang, Li, De-xian, Chen, Li-zhong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3
cites cdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3
container_end_page 1655
container_issue 12
container_start_page 1644
container_title Journal of Zhejiang University. A. Science
container_volume 9
creator Peng, Jian-ying
Yan, Xiao-lang
Li, De-xian
Chen, Li-zhong
description To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.
doi_str_mv 10.1631/jzus.A0820052
format article
fullrecord <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_zjdxxb_e200812004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><wanfj_id>zjdxxb_e200812004</wanfj_id><sourcerecordid>zjdxxb_e200812004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</originalsourceid><addsrcrecordid>eNpt0DtPwzAQB3ALgUQpjOyZkBjS-hE_OlYVL6kSC0hsluOcS6I8ip1A20-Pq4BYWHwefr47_xG6JnhGBCPz6jCE2RIrijGnJ2hClKApkZKfxruQLOWCv52jixCqKCQWcoLmy2RrvKlrqJMGms7vE-Pte9mD7QcPiet88lkW0CW2K8p2c4nOnKkDXP3UKXq9v3tZPabr54en1XKdWpaRPrW8sBmlOVADzFipCmBywWmuMHfULpxzGcuxNdlC5sQIo5QrCAaSMyU4WDZFt2PfL9M602501Q2-jRP1oSp2u1xD_KUi8ciivRnt1ncfA4ReN2WwUNemhW4ImnEuKBU0wnSE1ncheHB668vG-L0mWB8j1McI9W-E0c9GH6JrN-D_tvj_wTetGnQe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>35562262</pqid></control><display><type>article</type><title>A parallel memory architecture for video coding</title><source>Springer Link</source><creator>Peng, Jian-ying ; Yan, Xiao-lang ; Li, De-xian ; Chen, Li-zhong</creator><creatorcontrib>Peng, Jian-ying ; Yan, Xiao-lang ; Li, De-xian ; Chen, Li-zhong</creatorcontrib><description>To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.</description><identifier>ISSN: 1673-565X</identifier><identifier>EISSN: 1862-1775</identifier><identifier>DOI: 10.1631/jzus.A0820052</identifier><language>eng</language><publisher>Hangzhou: Zhejiang University Press</publisher><subject>Civil Engineering ; Classical and Continuum Physics ; Engineering ; Industrial Chemistry/Chemical Engineering ; Mechanical Engineering</subject><ispartof>Journal of Zhejiang University. A. Science, 2008-12, Vol.9 (12), p.1644-1655</ispartof><rights>Zhejiang University and Springer-Verlag GmbH 2008</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</citedby><cites>FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/zjdxxb-e/zjdxxb-e.jpg</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Peng, Jian-ying</creatorcontrib><creatorcontrib>Yan, Xiao-lang</creatorcontrib><creatorcontrib>Li, De-xian</creatorcontrib><creatorcontrib>Chen, Li-zhong</creatorcontrib><title>A parallel memory architecture for video coding</title><title>Journal of Zhejiang University. A. Science</title><addtitle>J. Zhejiang Univ. Sci. A</addtitle><description>To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.</description><subject>Civil Engineering</subject><subject>Classical and Continuum Physics</subject><subject>Engineering</subject><subject>Industrial Chemistry/Chemical Engineering</subject><subject>Mechanical Engineering</subject><issn>1673-565X</issn><issn>1862-1775</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><recordid>eNpt0DtPwzAQB3ALgUQpjOyZkBjS-hE_OlYVL6kSC0hsluOcS6I8ip1A20-Pq4BYWHwefr47_xG6JnhGBCPz6jCE2RIrijGnJ2hClKApkZKfxruQLOWCv52jixCqKCQWcoLmy2RrvKlrqJMGms7vE-Pte9mD7QcPiet88lkW0CW2K8p2c4nOnKkDXP3UKXq9v3tZPabr54en1XKdWpaRPrW8sBmlOVADzFipCmBywWmuMHfULpxzGcuxNdlC5sQIo5QrCAaSMyU4WDZFt2PfL9M602501Q2-jRP1oSp2u1xD_KUi8ciivRnt1ncfA4ReN2WwUNemhW4ImnEuKBU0wnSE1ncheHB668vG-L0mWB8j1McI9W-E0c9GH6JrN-D_tvj_wTetGnQe</recordid><startdate>20081201</startdate><enddate>20081201</enddate><creator>Peng, Jian-ying</creator><creator>Yan, Xiao-lang</creator><creator>Li, De-xian</creator><creator>Chen, Li-zhong</creator><general>Zhejiang University Press</general><general>Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>FR3</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20081201</creationdate><title>A parallel memory architecture for video coding</title><author>Peng, Jian-ying ; Yan, Xiao-lang ; Li, De-xian ; Chen, Li-zhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Civil Engineering</topic><topic>Classical and Continuum Physics</topic><topic>Engineering</topic><topic>Industrial Chemistry/Chemical Engineering</topic><topic>Mechanical Engineering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Jian-ying</creatorcontrib><creatorcontrib>Yan, Xiao-lang</creatorcontrib><creatorcontrib>Li, De-xian</creatorcontrib><creatorcontrib>Chen, Li-zhong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Journal of Zhejiang University. A. Science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Jian-ying</au><au>Yan, Xiao-lang</au><au>Li, De-xian</au><au>Chen, Li-zhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A parallel memory architecture for video coding</atitle><jtitle>Journal of Zhejiang University. A. Science</jtitle><stitle>J. Zhejiang Univ. Sci. A</stitle><date>2008-12-01</date><risdate>2008</risdate><volume>9</volume><issue>12</issue><spage>1644</spage><epage>1655</epage><pages>1644-1655</pages><issn>1673-565X</issn><eissn>1862-1775</eissn><abstract>To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.</abstract><cop>Hangzhou</cop><pub>Zhejiang University Press</pub><doi>10.1631/jzus.A0820052</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1673-565X
ispartof Journal of Zhejiang University. A. Science, 2008-12, Vol.9 (12), p.1644-1655
issn 1673-565X
1862-1775
language eng
recordid cdi_wanfang_journals_zjdxxb_e200812004
source Springer Link
subjects Civil Engineering
Classical and Continuum Physics
Engineering
Industrial Chemistry/Chemical Engineering
Mechanical Engineering
title A parallel memory architecture for video coding
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A56%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20parallel%20memory%20architecture%20for%20video%20coding&rft.jtitle=Journal%20of%20Zhejiang%20University.%20A.%20Science&rft.au=Peng,%20Jian-ying&rft.date=2008-12-01&rft.volume=9&rft.issue=12&rft.spage=1644&rft.epage=1655&rft.pages=1644-1655&rft.issn=1673-565X&rft.eissn=1862-1775&rft_id=info:doi/10.1631/jzus.A0820052&rft_dat=%3Cwanfang_jour_proqu%3Ezjdxxb_e200812004%3C/wanfang_jour_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=35562262&rft_id=info:pmid/&rft_wanfj_id=zjdxxb_e200812004&rfr_iscdi=true