Loading…
Decision support queries on a tape-resident data warehouse
Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are...
Saved in:
Published in: | Information systems (Oxford) 2005-04, Vol.30 (2), p.133-149 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33 |
---|---|
cites | cdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33 |
container_end_page | 149 |
container_issue | 2 |
container_start_page | 133 |
container_title | Information systems (Oxford) |
container_volume | 30 |
creator | Chatziantoniou, Damianos Johnson, Theodore |
description | Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.
In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques. |
doi_str_mv | 10.1016/j.is.2003.11.003 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_57606908</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306437903001194</els_id><sourcerecordid>57606908</sourcerecordid><originalsourceid>FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</originalsourceid><addsrcrecordid>eNp1kD1PwzAQhi0EEqWwM2ZiSzjHjR13Q-VTqsQCs-XYZ-GqTYIvBfHvcVVWpkc63XN672XsmkPFgcvbTRWpqgFExXmVccJmvFWilKDkKZuBAFkuhNLn7IJoAwB1o_WMLe_RRYpDX9B-HIc0FZ97TBGpyCNbTHbEMiFFj_1UeDvZ4tsm_Bj2hJfsLNgt4dUf5-z98eFt9VyuX59eVnfr0olaTGWnOtF2zi9qwLAINvhGiwCt942VoQHVCsWhsQJto5SEVniOvpaydVp3Vog5uzneHdOQw9FkdpEcbre2x5zDNFmSOntzBsdFlwaihMGMKe5s-jEczKEkszGRzKEkw7nJyMryqGB-4CtiMuQi9g59TOgm44f4v_wLOahudQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>57606908</pqid></control><display><type>article</type><title>Decision support queries on a tape-resident data warehouse</title><source>Library & Information Science Abstracts (LISA)</source><source>ScienceDirect Freedom Collection</source><creator>Chatziantoniou, Damianos ; Johnson, Theodore</creator><creatorcontrib>Chatziantoniou, Damianos ; Johnson, Theodore</creatorcontrib><description>Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.
In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.</description><identifier>ISSN: 0306-4379</identifier><identifier>EISSN: 1873-6076</identifier><identifier>DOI: 10.1016/j.is.2003.11.003</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Data banks ; Data warehouses ; Data warehousing ; Decision support ; Decision support systems ; Query processing ; Tape databases</subject><ispartof>Information systems (Oxford), 2005-04, Vol.30 (2), p.133-149</ispartof><rights>2003 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</citedby><cites>FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908,34119</link.rule.ids></links><search><creatorcontrib>Chatziantoniou, Damianos</creatorcontrib><creatorcontrib>Johnson, Theodore</creatorcontrib><title>Decision support queries on a tape-resident data warehouse</title><title>Information systems (Oxford)</title><description>Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.
In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.</description><subject>Data banks</subject><subject>Data warehouses</subject><subject>Data warehousing</subject><subject>Decision support</subject><subject>Decision support systems</subject><subject>Query processing</subject><subject>Tape databases</subject><issn>0306-4379</issn><issn>1873-6076</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp1kD1PwzAQhi0EEqWwM2ZiSzjHjR13Q-VTqsQCs-XYZ-GqTYIvBfHvcVVWpkc63XN672XsmkPFgcvbTRWpqgFExXmVccJmvFWilKDkKZuBAFkuhNLn7IJoAwB1o_WMLe_RRYpDX9B-HIc0FZ97TBGpyCNbTHbEMiFFj_1UeDvZ4tsm_Bj2hJfsLNgt4dUf5-z98eFt9VyuX59eVnfr0olaTGWnOtF2zi9qwLAINvhGiwCt942VoQHVCsWhsQJto5SEVniOvpaydVp3Vog5uzneHdOQw9FkdpEcbre2x5zDNFmSOntzBsdFlwaihMGMKe5s-jEczKEkszGRzKEkw7nJyMryqGB-4CtiMuQi9g59TOgm44f4v_wLOahudQ</recordid><startdate>20050401</startdate><enddate>20050401</enddate><creator>Chatziantoniou, Damianos</creator><creator>Johnson, Theodore</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>20050401</creationdate><title>Decision support queries on a tape-resident data warehouse</title><author>Chatziantoniou, Damianos ; Johnson, Theodore</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Data banks</topic><topic>Data warehouses</topic><topic>Data warehousing</topic><topic>Decision support</topic><topic>Decision support systems</topic><topic>Query processing</topic><topic>Tape databases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chatziantoniou, Damianos</creatorcontrib><creatorcontrib>Johnson, Theodore</creatorcontrib><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Information systems (Oxford)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chatziantoniou, Damianos</au><au>Johnson, Theodore</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Decision support queries on a tape-resident data warehouse</atitle><jtitle>Information systems (Oxford)</jtitle><date>2005-04-01</date><risdate>2005</risdate><volume>30</volume><issue>2</issue><spage>133</spage><epage>149</epage><pages>133-149</pages><issn>0306-4379</issn><eissn>1873-6076</eissn><abstract>Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.
In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.is.2003.11.003</doi><tpages>17</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0306-4379 |
ispartof | Information systems (Oxford), 2005-04, Vol.30 (2), p.133-149 |
issn | 0306-4379 1873-6076 |
language | eng |
recordid | cdi_proquest_miscellaneous_57606908 |
source | Library & Information Science Abstracts (LISA); ScienceDirect Freedom Collection |
subjects | Data banks Data warehouses Data warehousing Decision support Decision support systems Query processing Tape databases |
title | Decision support queries on a tape-resident data warehouse |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T14%3A08%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Decision%20support%20queries%20on%20a%20tape-resident%20data%20warehouse&rft.jtitle=Information%20systems%20(Oxford)&rft.au=Chatziantoniou,%20Damianos&rft.date=2005-04-01&rft.volume=30&rft.issue=2&rft.spage=133&rft.epage=149&rft.pages=133-149&rft.issn=0306-4379&rft.eissn=1873-6076&rft_id=info:doi/10.1016/j.is.2003.11.003&rft_dat=%3Cproquest_cross%3E57606908%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c323t-b7b38bcd420ef4fafd593f08dd5a6f507837105a3ea5776083d1ed2668c99ba33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=57606908&rft_id=info:pmid/&rfr_iscdi=true |