Loading…
A parallel memory architecture for video coding
To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit d...
Saved in:
Published in: | Journal of Zhejiang University. A. Science 2008-12, Vol.9 (12), p.1644-1655 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3 |
---|---|
cites | cdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3 |
container_end_page | 1655 |
container_issue | 12 |
container_start_page | 1644 |
container_title | Journal of Zhejiang University. A. Science |
container_volume | 9 |
creator | Peng, Jian-ying Yan, Xiao-lang Li, De-xian Chen, Li-zhong |
description | To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding. |
doi_str_mv | 10.1631/jzus.A0820052 |
format | article |
fullrecord | <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_zjdxxb_e200812004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><wanfj_id>zjdxxb_e200812004</wanfj_id><sourcerecordid>zjdxxb_e200812004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</originalsourceid><addsrcrecordid>eNpt0DtPwzAQB3ALgUQpjOyZkBjS-hE_OlYVL6kSC0hsluOcS6I8ip1A20-Pq4BYWHwefr47_xG6JnhGBCPz6jCE2RIrijGnJ2hClKApkZKfxruQLOWCv52jixCqKCQWcoLmy2RrvKlrqJMGms7vE-Pte9mD7QcPiet88lkW0CW2K8p2c4nOnKkDXP3UKXq9v3tZPabr54en1XKdWpaRPrW8sBmlOVADzFipCmBywWmuMHfULpxzGcuxNdlC5sQIo5QrCAaSMyU4WDZFt2PfL9M602501Q2-jRP1oSp2u1xD_KUi8ciivRnt1ncfA4ReN2WwUNemhW4ImnEuKBU0wnSE1ncheHB668vG-L0mWB8j1McI9W-E0c9GH6JrN-D_tvj_wTetGnQe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>35562262</pqid></control><display><type>article</type><title>A parallel memory architecture for video coding</title><source>Springer Link</source><creator>Peng, Jian-ying ; Yan, Xiao-lang ; Li, De-xian ; Chen, Li-zhong</creator><creatorcontrib>Peng, Jian-ying ; Yan, Xiao-lang ; Li, De-xian ; Chen, Li-zhong</creatorcontrib><description>To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.</description><identifier>ISSN: 1673-565X</identifier><identifier>EISSN: 1862-1775</identifier><identifier>DOI: 10.1631/jzus.A0820052</identifier><language>eng</language><publisher>Hangzhou: Zhejiang University Press</publisher><subject>Civil Engineering ; Classical and Continuum Physics ; Engineering ; Industrial Chemistry/Chemical Engineering ; Mechanical Engineering</subject><ispartof>Journal of Zhejiang University. A. Science, 2008-12, Vol.9 (12), p.1644-1655</ispartof><rights>Zhejiang University and Springer-Verlag GmbH 2008</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</citedby><cites>FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/zjdxxb-e/zjdxxb-e.jpg</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Peng, Jian-ying</creatorcontrib><creatorcontrib>Yan, Xiao-lang</creatorcontrib><creatorcontrib>Li, De-xian</creatorcontrib><creatorcontrib>Chen, Li-zhong</creatorcontrib><title>A parallel memory architecture for video coding</title><title>Journal of Zhejiang University. A. Science</title><addtitle>J. Zhejiang Univ. Sci. A</addtitle><description>To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.</description><subject>Civil Engineering</subject><subject>Classical and Continuum Physics</subject><subject>Engineering</subject><subject>Industrial Chemistry/Chemical Engineering</subject><subject>Mechanical Engineering</subject><issn>1673-565X</issn><issn>1862-1775</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><recordid>eNpt0DtPwzAQB3ALgUQpjOyZkBjS-hE_OlYVL6kSC0hsluOcS6I8ip1A20-Pq4BYWHwefr47_xG6JnhGBCPz6jCE2RIrijGnJ2hClKApkZKfxruQLOWCv52jixCqKCQWcoLmy2RrvKlrqJMGms7vE-Pte9mD7QcPiet88lkW0CW2K8p2c4nOnKkDXP3UKXq9v3tZPabr54en1XKdWpaRPrW8sBmlOVADzFipCmBywWmuMHfULpxzGcuxNdlC5sQIo5QrCAaSMyU4WDZFt2PfL9M602501Q2-jRP1oSp2u1xD_KUi8ciivRnt1ncfA4ReN2WwUNemhW4ImnEuKBU0wnSE1ncheHB668vG-L0mWB8j1McI9W-E0c9GH6JrN-D_tvj_wTetGnQe</recordid><startdate>20081201</startdate><enddate>20081201</enddate><creator>Peng, Jian-ying</creator><creator>Yan, Xiao-lang</creator><creator>Li, De-xian</creator><creator>Chen, Li-zhong</creator><general>Zhejiang University Press</general><general>Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>FR3</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20081201</creationdate><title>A parallel memory architecture for video coding</title><author>Peng, Jian-ying ; Yan, Xiao-lang ; Li, De-xian ; Chen, Li-zhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Civil Engineering</topic><topic>Classical and Continuum Physics</topic><topic>Engineering</topic><topic>Industrial Chemistry/Chemical Engineering</topic><topic>Mechanical Engineering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Jian-ying</creatorcontrib><creatorcontrib>Yan, Xiao-lang</creatorcontrib><creatorcontrib>Li, De-xian</creatorcontrib><creatorcontrib>Chen, Li-zhong</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Journal of Zhejiang University. A. Science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Jian-ying</au><au>Yan, Xiao-lang</au><au>Li, De-xian</au><au>Chen, Li-zhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A parallel memory architecture for video coding</atitle><jtitle>Journal of Zhejiang University. A. Science</jtitle><stitle>J. Zhejiang Univ. Sci. A</stitle><date>2008-12-01</date><risdate>2008</risdate><volume>9</volume><issue>12</issue><spage>1644</spage><epage>1655</epage><pages>1644-1655</pages><issn>1673-565X</issn><eissn>1862-1775</eissn><abstract>To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-μm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.</abstract><cop>Hangzhou</cop><pub>Zhejiang University Press</pub><doi>10.1631/jzus.A0820052</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1673-565X |
ispartof | Journal of Zhejiang University. A. Science, 2008-12, Vol.9 (12), p.1644-1655 |
issn | 1673-565X 1862-1775 |
language | eng |
recordid | cdi_wanfang_journals_zjdxxb_e200812004 |
source | Springer Link |
subjects | Civil Engineering Classical and Continuum Physics Engineering Industrial Chemistry/Chemical Engineering Mechanical Engineering |
title | A parallel memory architecture for video coding |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T20%3A56%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20parallel%20memory%20architecture%20for%20video%20coding&rft.jtitle=Journal%20of%20Zhejiang%20University.%20A.%20Science&rft.au=Peng,%20Jian-ying&rft.date=2008-12-01&rft.volume=9&rft.issue=12&rft.spage=1644&rft.epage=1655&rft.pages=1644-1655&rft.issn=1673-565X&rft.eissn=1862-1775&rft_id=info:doi/10.1631/jzus.A0820052&rft_dat=%3Cwanfang_jour_proqu%3Ezjdxxb_e200812004%3C/wanfang_jour_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c341t-c5dc422be2ae3ac78de37952b805f2c9fff43b0ca497b1a6a88fd10e1b3865ec3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=35562262&rft_id=info:pmid/&rft_wanfj_id=zjdxxb_e200812004&rfr_iscdi=true |