Loading…

Decision support queries on a tape-resident data warehouse

Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are...

Full description

Saved in:
Bibliographic Details
Published in:Information systems (Oxford) 2005-04, Vol.30 (2), p.133-149
Main Authors: Chatziantoniou, Damianos, Johnson, Theodore
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33
cites cdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33
container_end_page 149
container_issue 2
container_start_page 133
container_title Information systems (Oxford)
container_volume 30
creator Chatziantoniou, Damianos
Johnson, Theodore
description Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases. In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.
doi_str_mv 10.1016/j.is.2003.11.003
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_57606908</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306437903001194</els_id><sourcerecordid>57606908</sourcerecordid><originalsourceid>FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</originalsourceid><addsrcrecordid>eNp1kD1PwzAQhi0EEqWwM2ZiSzjHjR13Q-VTqsQCs-XYZ-GqTYIvBfHvcVVWpkc63XN672XsmkPFgcvbTRWpqgFExXmVccJmvFWilKDkKZuBAFkuhNLn7IJoAwB1o_WMLe_RRYpDX9B-HIc0FZ97TBGpyCNbTHbEMiFFj_1UeDvZ4tsm_Bj2hJfsLNgt4dUf5-z98eFt9VyuX59eVnfr0olaTGWnOtF2zi9qwLAINvhGiwCt942VoQHVCsWhsQJto5SEVniOvpaydVp3Vog5uzneHdOQw9FkdpEcbre2x5zDNFmSOntzBsdFlwaihMGMKe5s-jEczKEkszGRzKEkw7nJyMryqGB-4CtiMuQi9g59TOgm44f4v_wLOahudQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>57606908</pqid></control><display><type>article</type><title>Decision support queries on a tape-resident data warehouse</title><source>Library &amp; Information Science Abstracts (LISA)</source><source>ScienceDirect Freedom Collection</source><creator>Chatziantoniou, Damianos ; Johnson, Theodore</creator><creatorcontrib>Chatziantoniou, Damianos ; Johnson, Theodore</creatorcontrib><description>Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases. In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.</description><identifier>ISSN: 0306-4379</identifier><identifier>EISSN: 1873-6076</identifier><identifier>DOI: 10.1016/j.is.2003.11.003</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Data banks ; Data warehouses ; Data warehousing ; Decision support ; Decision support systems ; Query processing ; Tape databases</subject><ispartof>Information systems (Oxford), 2005-04, Vol.30 (2), p.133-149</ispartof><rights>2003 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</citedby><cites>FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908,34119</link.rule.ids></links><search><creatorcontrib>Chatziantoniou, Damianos</creatorcontrib><creatorcontrib>Johnson, Theodore</creatorcontrib><title>Decision support queries on a tape-resident data warehouse</title><title>Information systems (Oxford)</title><description>Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases. In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.</description><subject>Data banks</subject><subject>Data warehouses</subject><subject>Data warehousing</subject><subject>Decision support</subject><subject>Decision support systems</subject><subject>Query processing</subject><subject>Tape databases</subject><issn>0306-4379</issn><issn>1873-6076</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp1kD1PwzAQhi0EEqWwM2ZiSzjHjR13Q-VTqsQCs-XYZ-GqTYIvBfHvcVVWpkc63XN672XsmkPFgcvbTRWpqgFExXmVccJmvFWilKDkKZuBAFkuhNLn7IJoAwB1o_WMLe_RRYpDX9B-HIc0FZ97TBGpyCNbTHbEMiFFj_1UeDvZ4tsm_Bj2hJfsLNgt4dUf5-z98eFt9VyuX59eVnfr0olaTGWnOtF2zi9qwLAINvhGiwCt942VoQHVCsWhsQJto5SEVniOvpaydVp3Vog5uzneHdOQw9FkdpEcbre2x5zDNFmSOntzBsdFlwaihMGMKe5s-jEczKEkszGRzKEkw7nJyMryqGB-4CtiMuQi9g59TOgm44f4v_wLOahudQ</recordid><startdate>20050401</startdate><enddate>20050401</enddate><creator>Chatziantoniou, Damianos</creator><creator>Johnson, Theodore</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>20050401</creationdate><title>Decision support queries on a tape-resident data warehouse</title><author>Chatziantoniou, Damianos ; Johnson, Theodore</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Data banks</topic><topic>Data warehouses</topic><topic>Data warehousing</topic><topic>Decision support</topic><topic>Decision support systems</topic><topic>Query processing</topic><topic>Tape databases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chatziantoniou, Damianos</creatorcontrib><creatorcontrib>Johnson, Theodore</creatorcontrib><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><jtitle>Information systems (Oxford)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chatziantoniou, Damianos</au><au>Johnson, Theodore</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Decision support queries on a tape-resident data warehouse</atitle><jtitle>Information systems (Oxford)</jtitle><date>2005-04-01</date><risdate>2005</risdate><volume>30</volume><issue>2</issue><spage>133</spage><epage>149</epage><pages>133-149</pages><issn>0306-4379</issn><eissn>1873-6076</eissn><abstract>Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases. In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.is.2003.11.003</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0306-4379
ispartof Information systems (Oxford), 2005-04, Vol.30 (2), p.133-149
issn 0306-4379
1873-6076
language eng
recordid cdi_proquest_miscellaneous_57606908
source Library & Information Science Abstracts (LISA); ScienceDirect Freedom Collection
subjects Data banks
Data warehouses
Data warehousing
Decision support
Decision support systems
Query processing
Tape databases
title Decision support queries on a tape-resident data warehouse
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T14%3A08%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Decision%20support%20queries%20on%20a%20tape-resident%20data%20warehouse&rft.jtitle=Information%20systems%20(Oxford)&rft.au=Chatziantoniou,%20Damianos&rft.date=2005-04-01&rft.volume=30&rft.issue=2&rft.spage=133&rft.epage=149&rft.pages=133-149&rft.issn=0306-4379&rft.eissn=1873-6076&rft_id=info:doi/10.1016/j.is.2003.11.003&rft_dat=%3Cproquest_cross%3E57606908%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=57606908&rft_id=info:pmid/&rfr_iscdi=true