Loading…

DataFoundry: information management for scientific data

Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal of biomedical and health informatics 2000-03, Vol.4 (1), p.52-57
Main Authors: Critchlow, T., Fidelis, K., Ganesh, M., Musick, R., Slezak, T.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3
cites cdi_FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3
container_end_page 57
container_issue 1
container_start_page 52
container_title IEEE journal of biomedical and health informatics
container_volume 4
creator Critchlow, T.
Fidelis, K.
Ganesh, M.
Musick, R.
Slezak, T.
description Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in science as in business, working warehouses in this community are scarce because traditional warehousing techniques do not transfer to scientific environments. There are two primary reasons for this difficulty. First, schema integration is more difficult for scientific databases than for business sources because of the complexity of the concepts and the associated relationships. Second, scientific data sources have highly dynamic data representations (schemata). When a data source participating in a warehouse changes its schema, both the mediator transferring data to the warehouse and the warehouse itself need to be updated to reflect these modifications. The cost of repeatedly performing these updates in a traditional warehouse, as is required in a dynamic environment, is prohibitive. The paper discusses these issues within the context of the DataFoundry project, an ongoing research effort at Lawrence Livermore National Laboratory. DataFoundry utilizes a unique integration strategy to identify corresponding instances while maintaining differences between data from different sources, and a novel architecture and an extensive meta-data infrastructure, which reduce the cost of maintaining a warehouse.
doi_str_mv 10.1109/4233.826859
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_826859</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>826859</ieee_id><sourcerecordid>28255433</sourcerecordid><originalsourceid>FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3</originalsourceid><addsrcrecordid>eNqF0UtLxDAQAOAgiuvr5M2DFA8qSHUmyTSpN_ENghc9l2yTSJdtuzbtwX9vli4iHtZThuSbCTPD2CHCJSLkV5ILcal5pinfYDtIpFMAwTdjDDpPlVI4YbshzABQEoptNkFQGSold5i6M715aIfGdl_XSdX4tqtNX7VNUpvGfLjaNX0SL5NQVjGsfFUmNqbssy1v5sEdrM499v5w_3b7lL68Pj7f3rykpdSqT8ucW_BmakkYDrkjyFWmctJkQQGUIIhrK9FpTx65kWitFOSozN3Uoxd77Gysu-jaz8GFvqirULr53DSuHUKhtY5dasqiPF0rFYIALf6HXHMiKcT_UEkthKQIz9dCzBRyiRqWn5_8obN26Jo4wtgKgYzj4BFdjKjs2hA654tFV9Wm-yoQiuXOi-XOi3HnUR-vSg7T2tlfdlxyBEcjqJxzP8-r7G-doKtl</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>885045282</pqid></control><display><type>article</type><title>DataFoundry: information management for scientific data</title><source>IEEE Xplore (Online service)</source><creator>Critchlow, T. ; Fidelis, K. ; Ganesh, M. ; Musick, R. ; Slezak, T.</creator><creatorcontrib>Critchlow, T. ; Fidelis, K. ; Ganesh, M. ; Musick, R. ; Slezak, T.</creatorcontrib><description>Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in science as in business, working warehouses in this community are scarce because traditional warehousing techniques do not transfer to scientific environments. There are two primary reasons for this difficulty. First, schema integration is more difficult for scientific databases than for business sources because of the complexity of the concepts and the associated relationships. Second, scientific data sources have highly dynamic data representations (schemata). When a data source participating in a warehouse changes its schema, both the mediator transferring data to the warehouse and the warehouse itself need to be updated to reflect these modifications. The cost of repeatedly performing these updates in a traditional warehouse, as is required in a dynamic environment, is prohibitive. The paper discusses these issues within the context of the DataFoundry project, an ongoing research effort at Lawrence Livermore National Laboratory. DataFoundry utilizes a unique integration strategy to identify corresponding instances while maintaining differences between data from different sources, and a novel architecture and an extensive meta-data infrastructure, which reduce the cost of maintaining a warehouse.</description><identifier>ISSN: 1089-7771</identifier><identifier>ISSN: 2168-2194</identifier><identifier>EISSN: 1558-0032</identifier><identifier>EISSN: 2168-2208</identifier><identifier>DOI: 10.1109/4233.826859</identifier><identifier>PMID: 10761774</identifier><identifier>CODEN: ITIBFX</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Associate members ; Bioinformatics ; Business ; Communities ; Computer Systems ; Costs ; Costs and Cost Analysis ; Data analysis ; Data sources ; Database Management Systems - classification ; Database Management Systems - economics ; Database Management Systems - organization &amp; administration ; Databases as Topic - classification ; Databases as Topic - economics ; Databases as Topic - organization &amp; administration ; Dynamics ; Humans ; Information management ; Information Management - classification ; Information Management - economics ; Information Management - organization &amp; administration ; Information Services - organization &amp; administration ; Information systems ; Information Systems - classification ; Information Systems - economics ; Information Systems - organization &amp; administration ; Proteins ; Representations ; Science ; Sequences ; Strategy ; Systems Integration ; Warehouses ; Warehousing</subject><ispartof>IEEE journal of biomedical and health informatics, 2000-03, Vol.4 (1), p.52-57</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2000</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3</citedby><cites>FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/826859$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,54777</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/10761774$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Critchlow, T.</creatorcontrib><creatorcontrib>Fidelis, K.</creatorcontrib><creatorcontrib>Ganesh, M.</creatorcontrib><creatorcontrib>Musick, R.</creatorcontrib><creatorcontrib>Slezak, T.</creatorcontrib><title>DataFoundry: information management for scientific data</title><title>IEEE journal of biomedical and health informatics</title><addtitle>TITB</addtitle><addtitle>IEEE Trans Inf Technol Biomed</addtitle><description>Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in science as in business, working warehouses in this community are scarce because traditional warehousing techniques do not transfer to scientific environments. There are two primary reasons for this difficulty. First, schema integration is more difficult for scientific databases than for business sources because of the complexity of the concepts and the associated relationships. Second, scientific data sources have highly dynamic data representations (schemata). When a data source participating in a warehouse changes its schema, both the mediator transferring data to the warehouse and the warehouse itself need to be updated to reflect these modifications. The cost of repeatedly performing these updates in a traditional warehouse, as is required in a dynamic environment, is prohibitive. The paper discusses these issues within the context of the DataFoundry project, an ongoing research effort at Lawrence Livermore National Laboratory. DataFoundry utilizes a unique integration strategy to identify corresponding instances while maintaining differences between data from different sources, and a novel architecture and an extensive meta-data infrastructure, which reduce the cost of maintaining a warehouse.</description><subject>Associate members</subject><subject>Bioinformatics</subject><subject>Business</subject><subject>Communities</subject><subject>Computer Systems</subject><subject>Costs</subject><subject>Costs and Cost Analysis</subject><subject>Data analysis</subject><subject>Data sources</subject><subject>Database Management Systems - classification</subject><subject>Database Management Systems - economics</subject><subject>Database Management Systems - organization &amp; administration</subject><subject>Databases as Topic - classification</subject><subject>Databases as Topic - economics</subject><subject>Databases as Topic - organization &amp; administration</subject><subject>Dynamics</subject><subject>Humans</subject><subject>Information management</subject><subject>Information Management - classification</subject><subject>Information Management - economics</subject><subject>Information Management - organization &amp; administration</subject><subject>Information Services - organization &amp; administration</subject><subject>Information systems</subject><subject>Information Systems - classification</subject><subject>Information Systems - economics</subject><subject>Information Systems - organization &amp; administration</subject><subject>Proteins</subject><subject>Representations</subject><subject>Science</subject><subject>Sequences</subject><subject>Strategy</subject><subject>Systems Integration</subject><subject>Warehouses</subject><subject>Warehousing</subject><issn>1089-7771</issn><issn>2168-2194</issn><issn>1558-0032</issn><issn>2168-2208</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2000</creationdate><recordtype>article</recordtype><recordid>eNqF0UtLxDAQAOAgiuvr5M2DFA8qSHUmyTSpN_ENghc9l2yTSJdtuzbtwX9vli4iHtZThuSbCTPD2CHCJSLkV5ILcal5pinfYDtIpFMAwTdjDDpPlVI4YbshzABQEoptNkFQGSold5i6M715aIfGdl_XSdX4tqtNX7VNUpvGfLjaNX0SL5NQVjGsfFUmNqbssy1v5sEdrM499v5w_3b7lL68Pj7f3rykpdSqT8ucW_BmakkYDrkjyFWmctJkQQGUIIhrK9FpTx65kWitFOSozN3Uoxd77Gysu-jaz8GFvqirULr53DSuHUKhtY5dasqiPF0rFYIALf6HXHMiKcT_UEkthKQIz9dCzBRyiRqWn5_8obN26Jo4wtgKgYzj4BFdjKjs2hA654tFV9Wm-yoQiuXOi-XOi3HnUR-vSg7T2tlfdlxyBEcjqJxzP8-r7G-doKtl</recordid><startdate>20000301</startdate><enddate>20000301</enddate><creator>Critchlow, T.</creator><creator>Fidelis, K.</creator><creator>Ganesh, M.</creator><creator>Musick, R.</creator><creator>Slezak, T.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20000301</creationdate><title>DataFoundry: information management for scientific data</title><author>Critchlow, T. ; Fidelis, K. ; Ganesh, M. ; Musick, R. ; Slezak, T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Associate members</topic><topic>Bioinformatics</topic><topic>Business</topic><topic>Communities</topic><topic>Computer Systems</topic><topic>Costs</topic><topic>Costs and Cost Analysis</topic><topic>Data analysis</topic><topic>Data sources</topic><topic>Database Management Systems - classification</topic><topic>Database Management Systems - economics</topic><topic>Database Management Systems - organization &amp; administration</topic><topic>Databases as Topic - classification</topic><topic>Databases as Topic - economics</topic><topic>Databases as Topic - organization &amp; administration</topic><topic>Dynamics</topic><topic>Humans</topic><topic>Information management</topic><topic>Information Management - classification</topic><topic>Information Management - economics</topic><topic>Information Management - organization &amp; administration</topic><topic>Information Services - organization &amp; administration</topic><topic>Information systems</topic><topic>Information Systems - classification</topic><topic>Information Systems - economics</topic><topic>Information Systems - organization &amp; administration</topic><topic>Proteins</topic><topic>Representations</topic><topic>Science</topic><topic>Sequences</topic><topic>Strategy</topic><topic>Systems Integration</topic><topic>Warehouses</topic><topic>Warehousing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Critchlow, T.</creatorcontrib><creatorcontrib>Fidelis, K.</creatorcontrib><creatorcontrib>Ganesh, M.</creatorcontrib><creatorcontrib>Musick, R.</creatorcontrib><creatorcontrib>Slezak, T.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE journal of biomedical and health informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Critchlow, T.</au><au>Fidelis, K.</au><au>Ganesh, M.</au><au>Musick, R.</au><au>Slezak, T.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DataFoundry: information management for scientific data</atitle><jtitle>IEEE journal of biomedical and health informatics</jtitle><stitle>TITB</stitle><addtitle>IEEE Trans Inf Technol Biomed</addtitle><date>2000-03-01</date><risdate>2000</risdate><volume>4</volume><issue>1</issue><spage>52</spage><epage>57</epage><pages>52-57</pages><issn>1089-7771</issn><issn>2168-2194</issn><eissn>1558-0032</eissn><eissn>2168-2208</eissn><coden>ITIBFX</coden><abstract>Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in science as in business, working warehouses in this community are scarce because traditional warehousing techniques do not transfer to scientific environments. There are two primary reasons for this difficulty. First, schema integration is more difficult for scientific databases than for business sources because of the complexity of the concepts and the associated relationships. Second, scientific data sources have highly dynamic data representations (schemata). When a data source participating in a warehouse changes its schema, both the mediator transferring data to the warehouse and the warehouse itself need to be updated to reflect these modifications. The cost of repeatedly performing these updates in a traditional warehouse, as is required in a dynamic environment, is prohibitive. The paper discusses these issues within the context of the DataFoundry project, an ongoing research effort at Lawrence Livermore National Laboratory. DataFoundry utilizes a unique integration strategy to identify corresponding instances while maintaining differences between data from different sources, and a novel architecture and an extensive meta-data infrastructure, which reduce the cost of maintaining a warehouse.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>10761774</pmid><doi>10.1109/4233.826859</doi><tpages>6</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1089-7771
ispartof IEEE journal of biomedical and health informatics, 2000-03, Vol.4 (1), p.52-57
issn 1089-7771
2168-2194
1558-0032
2168-2208
language eng
recordid cdi_ieee_primary_826859
source IEEE Xplore (Online service)
subjects Associate members
Bioinformatics
Business
Communities
Computer Systems
Costs
Costs and Cost Analysis
Data analysis
Data sources
Database Management Systems - classification
Database Management Systems - economics
Database Management Systems - organization & administration
Databases as Topic - classification
Databases as Topic - economics
Databases as Topic - organization & administration
Dynamics
Humans
Information management
Information Management - classification
Information Management - economics
Information Management - organization & administration
Information Services - organization & administration
Information systems
Information Systems - classification
Information Systems - economics
Information Systems - organization & administration
Proteins
Representations
Science
Sequences
Strategy
Systems Integration
Warehouses
Warehousing
title DataFoundry: information management for scientific data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T07%3A51%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DataFoundry:%20information%20management%20for%20scientific%20data&rft.jtitle=IEEE%20journal%20of%20biomedical%20and%20health%20informatics&rft.au=Critchlow,%20T.&rft.date=2000-03-01&rft.volume=4&rft.issue=1&rft.spage=52&rft.epage=57&rft.pages=52-57&rft.issn=1089-7771&rft.eissn=1558-0032&rft.coden=ITIBFX&rft_id=info:doi/10.1109/4233.826859&rft_dat=%3Cproquest_ieee_%3E28255433%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c487t-c92d0fabd53a209e5097679585d0700c03528d41e8f5f12a41dd435e5c9ebf1f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=885045282&rft_id=info:pmid/10761774&rft_ieee_id=826859&rfr_iscdi=true