Loading…

ISABELA for effective in situ compression of scientific data

SUMMARYExploding dataset sizes from extreme‐scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data col...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2013-02, Vol.25 (4), p.524-540
Main Authors: Lakshminarasimhan, Sriram, Shah, Neil, Ethier, Stephane, Ku, Seung-Hoe, Chang, C. S., Klasky, Scott, Latham, Rob, Ross, Rob, Samatova, Nagiza F.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473
cites cdi_FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473
container_end_page 540
container_issue 4
container_start_page 524
container_title Concurrency and computation
container_volume 25
creator Lakshminarasimhan, Sriram
Shah, Neil
Ethier, Stephane
Ku, Seung-Hoe
Chang, C. S.
Klasky, Scott
Latham, Rob
Ross, Rob
Samatova, Nagiza F.
description SUMMARYExploding dataset sizes from extreme‐scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real‐valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.To address this problem, we propose an effective method for In situ Sort‐And‐B‐spline Error‐bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre‐conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ⩾0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy.We extend upon our previous paper by additionally building a communication‐free, scalable parallel storage framework on top of ISABELA‐compressed data that is ideally suited for extreme‐scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low‐overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space. Copyright © 2012 John Wiley & Sons, Ltd.
doi_str_mv 10.1002/cpe.2887
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671593045</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671593045</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473</originalsourceid><addsrcrecordid>eNp10E1LAzEQgOEgCtYq-BNy9LKa7GQ3u-ClrrUW6geo9BjSZALR7W5Ntmr_vS2VigdPM4eHYXgJOeXsnDOWXpgFnqdFIfdIj2eQJiwHsb_b0_yQHMX4yhjnDHiPXI6fBlfDyYC6NlB0Dk3nP5D6hkbfLalp54uAMfq2oa2j0XhsOu-8oVZ3-pgcOF1HPPmZffJyM3yubpPJw2hcDSaJAchlks8slDMAxqRGa4XVTEhjyhJ0ZrXTpUWJuH6pdLMidYYVGiwKwbgD64SEPjnb3l2E9n2JsVNzHw3WtW6wXUbFc8mzEpjIfqkJbYwBnVoEP9dhpThTm0BqHUhtAq1psqWfvsbVv05Vj8O_3scOv3ZehzeVS5CZmt6P1HU1qsT0TqgSvgF9lnU8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671593045</pqid></control><display><type>article</type><title>ISABELA for effective in situ compression of scientific data</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Lakshminarasimhan, Sriram ; Shah, Neil ; Ethier, Stephane ; Ku, Seung-Hoe ; Chang, C. S. ; Klasky, Scott ; Latham, Rob ; Ross, Rob ; Samatova, Nagiza F.</creator><creatorcontrib>Lakshminarasimhan, Sriram ; Shah, Neil ; Ethier, Stephane ; Ku, Seung-Hoe ; Chang, C. S. ; Klasky, Scott ; Latham, Rob ; Ross, Rob ; Samatova, Nagiza F.</creatorcontrib><description>SUMMARYExploding dataset sizes from extreme‐scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real‐valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.To address this problem, we propose an effective method for In situ Sort‐And‐B‐spline Error‐bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre‐conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ⩾0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy.We extend upon our previous paper by additionally building a communication‐free, scalable parallel storage framework on top of ISABELA‐compressed data that is ideally suited for extreme‐scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low‐overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space. Copyright © 2012 John Wiley &amp; Sons, Ltd.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.2887</identifier><language>eng</language><publisher>Blackwell Publishing Ltd</publisher><subject>B-spline ; Computation ; Computer simulation ; Data compression ; Data reduction ; data-intensive application ; high performance computing ; in situ processing ; Longitudinal waves ; lossy compression ; Mathematical analysis ; Reproduction ; Run time (computers)</subject><ispartof>Concurrency and computation, 2013-02, Vol.25 (4), p.524-540</ispartof><rights>Copyright © 2012 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473</citedby><cites>FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Lakshminarasimhan, Sriram</creatorcontrib><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Ethier, Stephane</creatorcontrib><creatorcontrib>Ku, Seung-Hoe</creatorcontrib><creatorcontrib>Chang, C. S.</creatorcontrib><creatorcontrib>Klasky, Scott</creatorcontrib><creatorcontrib>Latham, Rob</creatorcontrib><creatorcontrib>Ross, Rob</creatorcontrib><creatorcontrib>Samatova, Nagiza F.</creatorcontrib><title>ISABELA for effective in situ compression of scientific data</title><title>Concurrency and computation</title><addtitle>Concurrency Computat.: Pract. Exper</addtitle><description>SUMMARYExploding dataset sizes from extreme‐scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real‐valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.To address this problem, we propose an effective method for In situ Sort‐And‐B‐spline Error‐bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre‐conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ⩾0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy.We extend upon our previous paper by additionally building a communication‐free, scalable parallel storage framework on top of ISABELA‐compressed data that is ideally suited for extreme‐scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low‐overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space. Copyright © 2012 John Wiley &amp; Sons, Ltd.</description><subject>B-spline</subject><subject>Computation</subject><subject>Computer simulation</subject><subject>Data compression</subject><subject>Data reduction</subject><subject>data-intensive application</subject><subject>high performance computing</subject><subject>in situ processing</subject><subject>Longitudinal waves</subject><subject>lossy compression</subject><subject>Mathematical analysis</subject><subject>Reproduction</subject><subject>Run time (computers)</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNp10E1LAzEQgOEgCtYq-BNy9LKa7GQ3u-ClrrUW6geo9BjSZALR7W5Ntmr_vS2VigdPM4eHYXgJOeXsnDOWXpgFnqdFIfdIj2eQJiwHsb_b0_yQHMX4yhjnDHiPXI6fBlfDyYC6NlB0Dk3nP5D6hkbfLalp54uAMfq2oa2j0XhsOu-8oVZ3-pgcOF1HPPmZffJyM3yubpPJw2hcDSaJAchlks8slDMAxqRGa4XVTEhjyhJ0ZrXTpUWJuH6pdLMidYYVGiwKwbgD64SEPjnb3l2E9n2JsVNzHw3WtW6wXUbFc8mzEpjIfqkJbYwBnVoEP9dhpThTm0BqHUhtAq1psqWfvsbVv05Vj8O_3scOv3ZehzeVS5CZmt6P1HU1qsT0TqgSvgF9lnU8</recordid><startdate>20130201</startdate><enddate>20130201</enddate><creator>Lakshminarasimhan, Sriram</creator><creator>Shah, Neil</creator><creator>Ethier, Stephane</creator><creator>Ku, Seung-Hoe</creator><creator>Chang, C. S.</creator><creator>Klasky, Scott</creator><creator>Latham, Rob</creator><creator>Ross, Rob</creator><creator>Samatova, Nagiza F.</creator><general>Blackwell Publishing Ltd</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20130201</creationdate><title>ISABELA for effective in situ compression of scientific data</title><author>Lakshminarasimhan, Sriram ; Shah, Neil ; Ethier, Stephane ; Ku, Seung-Hoe ; Chang, C. S. ; Klasky, Scott ; Latham, Rob ; Ross, Rob ; Samatova, Nagiza F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>B-spline</topic><topic>Computation</topic><topic>Computer simulation</topic><topic>Data compression</topic><topic>Data reduction</topic><topic>data-intensive application</topic><topic>high performance computing</topic><topic>in situ processing</topic><topic>Longitudinal waves</topic><topic>lossy compression</topic><topic>Mathematical analysis</topic><topic>Reproduction</topic><topic>Run time (computers)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lakshminarasimhan, Sriram</creatorcontrib><creatorcontrib>Shah, Neil</creatorcontrib><creatorcontrib>Ethier, Stephane</creatorcontrib><creatorcontrib>Ku, Seung-Hoe</creatorcontrib><creatorcontrib>Chang, C. S.</creatorcontrib><creatorcontrib>Klasky, Scott</creatorcontrib><creatorcontrib>Latham, Rob</creatorcontrib><creatorcontrib>Ross, Rob</creatorcontrib><creatorcontrib>Samatova, Nagiza F.</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lakshminarasimhan, Sriram</au><au>Shah, Neil</au><au>Ethier, Stephane</au><au>Ku, Seung-Hoe</au><au>Chang, C. S.</au><au>Klasky, Scott</au><au>Latham, Rob</au><au>Ross, Rob</au><au>Samatova, Nagiza F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ISABELA for effective in situ compression of scientific data</atitle><jtitle>Concurrency and computation</jtitle><addtitle>Concurrency Computat.: Pract. Exper</addtitle><date>2013-02-01</date><risdate>2013</risdate><volume>25</volume><issue>4</issue><spage>524</spage><epage>540</epage><pages>524-540</pages><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>SUMMARYExploding dataset sizes from extreme‐scale scientific simulations necessitates efficient data management and reduction schemes to mitigate I/O costs. With the discrepancy between I/O bandwidth and computational power, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Although data compression can be an effective solution, the random nature of real‐valued scientific datasets renders lossless compression routines ineffective. These techniques also impose significant overhead during decompression, making them unsuitable for data analysis and visualization, which require repeated data access.To address this problem, we propose an effective method for In situ Sort‐And‐B‐spline Error‐bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a pre‐conditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ⩾0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as wavelet compression, in terms of data reduction and accuracy.We extend upon our previous paper by additionally building a communication‐free, scalable parallel storage framework on top of ISABELA‐compressed data that is ideally suited for extreme‐scale analytical processing. The basis for our storage framework is an inherently local decompression method (it need not decode the entire data), which allows for random access decompression and low‐overhead task division that can be exploited over heterogeneous architectures. Furthermore, analytical operations such as correlation and query processing run quickly and accurately over data in the compressed space. Copyright © 2012 John Wiley &amp; Sons, Ltd.</abstract><pub>Blackwell Publishing Ltd</pub><doi>10.1002/cpe.2887</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1532-0626
ispartof Concurrency and computation, 2013-02, Vol.25 (4), p.524-540
issn 1532-0626
1532-0634
language eng
recordid cdi_proquest_miscellaneous_1671593045
source Wiley-Blackwell Read & Publish Collection
subjects B-spline
Computation
Computer simulation
Data compression
Data reduction
data-intensive application
high performance computing
in situ processing
Longitudinal waves
lossy compression
Mathematical analysis
Reproduction
Run time (computers)
title ISABELA for effective in situ compression of scientific data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T13%3A51%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ISABELA%20for%20effective%20in%20situ%20compression%20of%20scientific%20data&rft.jtitle=Concurrency%20and%20computation&rft.au=Lakshminarasimhan,%20Sriram&rft.date=2013-02-01&rft.volume=25&rft.issue=4&rft.spage=524&rft.epage=540&rft.pages=524-540&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.2887&rft_dat=%3Cproquest_cross%3E1671593045%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3367-6bd39b33007aedd4da047cc993a5dafa9de7ee0019fb82fc08a3de4401f3df473%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1671593045&rft_id=info:pmid/&rfr_iscdi=true