Loading…

Clustering semantically heterogeneous distributed aggregate databases

Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications invol...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge and information systems 2014-02, Vol.38 (2), p.331-364
Main Authors:	Zhang, Shuai, McClean, Sally I., Scotney, Bryan W.
Format:	Article
Language:	English
Subjects:	Aggregates Algorithmics. Computability. Computer arithmetics Algorithms Analysis Applied sciences Clustering Computer Science Computer science control theory systems Computer systems and distributed systems. User interface Cooperation Data mining Data Mining and Knowledge Discovery Data processing. List processing. Character string processing Database Management Exact sciences and technology Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) Information systems. Data bases IT in Business Knowledge discovery Memory organisation. Data processing Regular Paper Semantic web Semantics Software Studies Theoretical computing
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c331t-af0a8a178fbbc3749d75fba183489235a626b794bdf4d03266213d1bd4207aa63
container_end_page	364
container_issue	2
container_start_page	331
container_title	Knowledge and information systems
container_volume	38
creator	Zhang, Shuai McClean, Sally I. Scotney, Bryan W.
description	Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.
doi_str_mv	10.1007/s10115-012-0588-4
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1506378343</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1506378343</sourcerecordid><originalsourceid>FETCH-LOGICAL-c331t-af0a8a178fbbc3749d75fba183489235a626b794bdf4d03266213d1bd4207aa63</originalsourceid><addsrcrecordid>eNp1kE1LxDAQhosouH78AG8FEbxUM0matEdZ1g9Y8KLnMG3S2qXbrpn0sP_e1i4igqcZhmdeXp4ougJ2B4zpewIGkCYMeMLSLEvkUbRgHPJEAKjjww5C69PojGjDGGgFsIhWy3ag4HzT1TG5LXahKbFt9_GHG6997TrXDxTbhoJviiE4G2Nde1djcLHFgAWSo4vopMKW3OVhnkfvj6u35XOyfn16WT6sk1IICAlWDDMEnVVFUQotc6vTqkDIhMxyLlJUXBU6l4WtpGWCK8VBWCis5EwjKnEe3c65O99_Do6C2TZUurbF75oGUqaEHuPEiF7_QTf94LuxnQGZjxjPpBwpmKnS90TeVWbnmy36vQFmJrFmFmtGsWYSa6afm0My0uiq8tiVDf088gw4T-VUls8c7Sa9zv9q8G_4F9Qvh4k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1490632844</pqid></control><display><type>article</type><title>Clustering semantically heterogeneous distributed aggregate databases</title><source>ABI/INFORM Global</source><source>Springer Link</source><creator>Zhang, Shuai ; McClean, Sally I. ; Scotney, Bryan W.</creator><creatorcontrib>Zhang, Shuai ; McClean, Sally I. ; Scotney, Bryan W.</creatorcontrib><description>Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.</description><identifier>ISSN: 0219-1377</identifier><identifier>EISSN: 0219-3116</identifier><identifier>DOI: 10.1007/s10115-012-0588-4</identifier><identifier>CODEN: KISNCR</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Aggregates ; Algorithmics. Computability. Computer arithmetics ; Algorithms ; Analysis ; Applied sciences ; Clustering ; Computer Science ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Cooperation ; Data mining ; Data Mining and Knowledge Discovery ; Data processing. List processing. Character string processing ; Database Management ; Exact sciences and technology ; Information Storage and Retrieval ; Information Systems and Communication Service ; Information Systems Applications (incl.Internet) ; Information systems. Data bases ; IT in Business ; Knowledge discovery ; Memory organisation. Data processing ; Regular Paper ; Semantic web ; Semantics ; Software ; Studies ; Theoretical computing</subject><ispartof>Knowledge and information systems, 2014-02, Vol.38 (2), p.331-364</ispartof><rights>Springer-Verlag London 2012</rights><rights>2015 INIST-CNRS</rights><rights>Springer-Verlag London 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c331t-af0a8a178fbbc3749d75fba183489235a626b794bdf4d03266213d1bd4207aa63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1490632844/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1490632844?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,36061,44363,74895</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28122546$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Shuai</creatorcontrib><creatorcontrib>McClean, Sally I.</creatorcontrib><creatorcontrib>Scotney, Bryan W.</creatorcontrib><title>Clustering semantically heterogeneous distributed aggregate databases</title><title>Knowledge and information systems</title><addtitle>Knowl Inf Syst</addtitle><description>Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.</description><subject>Aggregates</subject><subject>Algorithmics. Computability. Computer arithmetics</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Applied sciences</subject><subject>Clustering</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Cooperation</subject><subject>Data mining</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Data processing. List processing. Character string processing</subject><subject>Database Management</subject><subject>Exact sciences and technology</subject><subject>Information Storage and Retrieval</subject><subject>Information Systems and Communication Service</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Information systems. Data bases</subject><subject>IT in Business</subject><subject>Knowledge discovery</subject><subject>Memory organisation. Data processing</subject><subject>Regular Paper</subject><subject>Semantic web</subject><subject>Semantics</subject><subject>Software</subject><subject>Studies</subject><subject>Theoretical computing</subject><issn>0219-1377</issn><issn>0219-3116</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kE1LxDAQhosouH78AG8FEbxUM0matEdZ1g9Y8KLnMG3S2qXbrpn0sP_e1i4igqcZhmdeXp4ougJ2B4zpewIGkCYMeMLSLEvkUbRgHPJEAKjjww5C69PojGjDGGgFsIhWy3ag4HzT1TG5LXahKbFt9_GHG6997TrXDxTbhoJviiE4G2Nde1djcLHFgAWSo4vopMKW3OVhnkfvj6u35XOyfn16WT6sk1IICAlWDDMEnVVFUQotc6vTqkDIhMxyLlJUXBU6l4WtpGWCK8VBWCis5EwjKnEe3c65O99_Do6C2TZUurbF75oGUqaEHuPEiF7_QTf94LuxnQGZjxjPpBwpmKnS90TeVWbnmy36vQFmJrFmFmtGsWYSa6afm0My0uiq8tiVDf088gw4T-VUls8c7Sa9zv9q8G_4F9Qvh4k</recordid><startdate>20140201</startdate><enddate>20140201</enddate><creator>Zhang, Shuai</creator><creator>McClean, Sally I.</creator><creator>Scotney, Bryan W.</creator><general>Springer London</general><general>Springer</general><general>Springer Nature B.V</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20140201</creationdate><title>Clustering semantically heterogeneous distributed aggregate databases</title><author>Zhang, Shuai ; McClean, Sally I. ; Scotney, Bryan W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c331t-af0a8a178fbbc3749d75fba183489235a626b794bdf4d03266213d1bd4207aa63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Aggregates</topic><topic>Algorithmics. Computability. Computer arithmetics</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Applied sciences</topic><topic>Clustering</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Cooperation</topic><topic>Data mining</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Data processing. List processing. Character string processing</topic><topic>Database Management</topic><topic>Exact sciences and technology</topic><topic>Information Storage and Retrieval</topic><topic>Information Systems and Communication Service</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Information systems. Data bases</topic><topic>IT in Business</topic><topic>Knowledge discovery</topic><topic>Memory organisation. Data processing</topic><topic>Regular Paper</topic><topic>Semantic web</topic><topic>Semantics</topic><topic>Software</topic><topic>Studies</topic><topic>Theoretical computing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Shuai</creatorcontrib><creatorcontrib>McClean, Sally I.</creatorcontrib><creatorcontrib>Scotney, Bryan W.</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Global News & ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Knowledge and information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Shuai</au><au>McClean, Sally I.</au><au>Scotney, Bryan W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering semantically heterogeneous distributed aggregate databases</atitle><jtitle>Knowledge and information systems</jtitle><stitle>Knowl Inf Syst</stitle><date>2014-02-01</date><risdate>2014</risdate><volume>38</volume><issue>2</issue><spage>331</spage><epage>364</epage><pages>331-364</pages><issn>0219-1377</issn><eissn>0219-3116</eissn><coden>KISNCR</coden><abstract>Databases developed independently in a common open distributed environment may be heterogeneous with respect to both data schema and the embedded semantics. Managing schema and semantic heterogeneities brings considerable challenges to learning from distributed data and to support applications involving cooperation between different organisations. In this paper, we are concerned mainly with heterogeneous databases that hold aggregates on a set of attributes, which are often the result of materialised views of native large-scale distributed databases. A model-based clustering algorithm is proposed to construct a mixture model where each component corresponds to a cluster which is used to capture the contextual heterogeneity among databases from different populations. Schema heterogeneity, which can be recast as incomplete information, is handled within the clustering process using Expectation-Maximisation estimation and integration is carried out within a clustering iteration. Our proposed algorithm resolves the schema heterogeneity as part of the clustering process, thus avoiding transformation of the data into a unified schema. Results of algorithm evaluation on classification, scalability and reliability, using both real and synthetic data, demonstrate that our algorithm can achieve good performance by incorporating all of the information from available heterogeneous data. Our clustering approach has great potential for scalable knowledge discovery from semantically heterogeneous databases and for applications in an open distributed environment, such as the Semantic Web.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s10115-012-0588-4</doi><tpages>34</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0219-1377
ispartof	Knowledge and information systems, 2014-02, Vol.38 (2), p.331-364
issn	0219-1377 0219-3116
language	eng
recordid	cdi_proquest_miscellaneous_1506378343
source	ABI/INFORM Global; Springer Link
subjects	Aggregates Algorithmics. Computability. Computer arithmetics Algorithms Analysis Applied sciences Clustering Computer Science Computer science control theory systems Computer systems and distributed systems. User interface Cooperation Data mining Data Mining and Knowledge Discovery Data processing. List processing. Character string processing Database Management Exact sciences and technology Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) Information systems. Data bases IT in Business Knowledge discovery Memory organisation. Data processing Regular Paper Semantic web Semantics Software Studies Theoretical computing
title	Clustering semantically heterogeneous distributed aggregate databases
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T02%3A06%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20semantically%20heterogeneous%20distributed%20aggregate%20databases&rft.jtitle=Knowledge%20and%20information%20systems&rft.au=Zhang,%20Shuai&rft.date=2014-02-01&rft.volume=38&rft.issue=2&rft.spage=331&rft.epage=364&rft.pages=331-364&rft.issn=0219-1377&rft.eissn=0219-3116&rft.coden=KISNCR&rft_id=info:doi/10.1007/s10115-012-0588-4&rft_dat=%3Cproquest_cross%3E1506378343%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c331t-af0a8a178fbbc3749d75fba183489235a626b794bdf4d03266213d1bd4207aa63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1490632844&rft_id=info:pmid/&rfr_iscdi=true