Loading…
Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance
We consider a clustering problem in which the data objects are rooted m -ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VE...
Saved in:
Published in: | Annals of operations research 2020-06, Vol.289 (1), p.85-122 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33 |
---|---|
cites | cdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33 |
container_end_page | 122 |
container_issue | 1 |
container_start_page | 85 |
container_title | Annals of operations research |
container_volume | 289 |
creator | Dinler, Derya Tural, Mustafa Kemal Ozdemirel, Nur Evin |
description | We consider a clustering problem in which the data objects are rooted
m
-ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means. |
doi_str_mv | 10.1007/s10479-019-03505-7 |
format | article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2407710643</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A625269174</galeid><sourcerecordid>A625269174</sourcerecordid><originalsourceid>FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</originalsourceid><addsrcrecordid>eNp9kU1r3DAQhkVoIdu0f6AnQ691Mvqy7GPYbNJCIIcmvapjeewo7NquJJfm31fbDSSBUoRGSDzPCOll7COHUw5gziIHZZoSeJ5Sgy7NEVtxbUTZSFm_YSsQWpVaSjhm72J8AADOa71iP9Y0pjD5rmgxUlfcBqLyWwqLS0vI-wtMWKy3S0wU_DgUd3Ffv1NI9Pts0w1U3PyisMW5wLErrgLO98Wm86m48DHh6Og9e9vjNtKHp_WE3V1ubtdfyuubq6_r8-vSKSFTqWQFJLkCVMixdpWusOucarhQrVCSa4DW1ChMi7LWEnoylaNGcDSyb6U8YZ8Ofecw_VwoJvswLWHMV1qhwBgOlXpBDbgl68d-SgHdzkdnzyuhRdVwozJ1-g8qj4523k0j9T6fvxI-vxDaJX8SxVyiH-5THHCJ8TUuDrgLU4yBejsHv8PwaDnYfZ72kKfNedq_eVqTJXmQ4rxPgsLzA_9j_QEJK5_w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2407710643</pqid></control><display><type>article</type><title>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</title><source>ABI/INFORM Global</source><source>Access via Business Source (EBSCOhost)</source><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Dinler, Derya ; Tural, Mustafa Kemal ; Ozdemirel, Nur Evin</creator><creatorcontrib>Dinler, Derya ; Tural, Mustafa Kemal ; Ozdemirel, Nur Evin</creatorcontrib><description>We consider a clustering problem in which the data objects are rooted
m
-ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.</description><identifier>ISSN: 0254-5330</identifier><identifier>EISSN: 1572-9338</identifier><identifier>DOI: 10.1007/s10479-019-03505-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Brain ; Business and Management ; Centroids ; Clustering ; Combinatorics ; Operations research ; Operations Research/Decision Theory ; Outliers (statistics) ; S.I.: OR in Neuroscience II ; Structured data ; Theory of Computation ; Trees</subject><ispartof>Annals of operations research, 2020-06, Vol.289 (1), p.85-122</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>COPYRIGHT 2020 Springer</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</citedby><cites>FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</cites><orcidid>0000-0003-3456-8502</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2407710643/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2407710643?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>Dinler, Derya</creatorcontrib><creatorcontrib>Tural, Mustafa Kemal</creatorcontrib><creatorcontrib>Ozdemirel, Nur Evin</creatorcontrib><title>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</title><title>Annals of operations research</title><addtitle>Ann Oper Res</addtitle><description>We consider a clustering problem in which the data objects are rooted
m
-ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.</description><subject>Algorithms</subject><subject>Brain</subject><subject>Business and Management</subject><subject>Centroids</subject><subject>Clustering</subject><subject>Combinatorics</subject><subject>Operations research</subject><subject>Operations Research/Decision Theory</subject><subject>Outliers (statistics)</subject><subject>S.I.: OR in Neuroscience II</subject><subject>Structured data</subject><subject>Theory of Computation</subject><subject>Trees</subject><issn>0254-5330</issn><issn>1572-9338</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kU1r3DAQhkVoIdu0f6AnQ691Mvqy7GPYbNJCIIcmvapjeewo7NquJJfm31fbDSSBUoRGSDzPCOll7COHUw5gziIHZZoSeJ5Sgy7NEVtxbUTZSFm_YSsQWpVaSjhm72J8AADOa71iP9Y0pjD5rmgxUlfcBqLyWwqLS0vI-wtMWKy3S0wU_DgUd3Ffv1NI9Pts0w1U3PyisMW5wLErrgLO98Wm86m48DHh6Og9e9vjNtKHp_WE3V1ubtdfyuubq6_r8-vSKSFTqWQFJLkCVMixdpWusOucarhQrVCSa4DW1ChMi7LWEnoylaNGcDSyb6U8YZ8Ofecw_VwoJvswLWHMV1qhwBgOlXpBDbgl68d-SgHdzkdnzyuhRdVwozJ1-g8qj4523k0j9T6fvxI-vxDaJX8SxVyiH-5THHCJ8TUuDrgLU4yBejsHv8PwaDnYfZ72kKfNedq_eVqTJXmQ4rxPgsLzA_9j_QEJK5_w</recordid><startdate>20200601</startdate><enddate>20200601</enddate><creator>Dinler, Derya</creator><creator>Tural, Mustafa Kemal</creator><creator>Ozdemirel, Nur Evin</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>3V.</scope><scope>7TA</scope><scope>7TB</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>KR7</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-3456-8502</orcidid></search><sort><creationdate>20200601</creationdate><title>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</title><author>Dinler, Derya ; Tural, Mustafa Kemal ; Ozdemirel, Nur Evin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Brain</topic><topic>Business and Management</topic><topic>Centroids</topic><topic>Clustering</topic><topic>Combinatorics</topic><topic>Operations research</topic><topic>Operations Research/Decision Theory</topic><topic>Outliers (statistics)</topic><topic>S.I.: OR in Neuroscience II</topic><topic>Structured data</topic><topic>Theory of Computation</topic><topic>Trees</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dinler, Derya</creatorcontrib><creatorcontrib>Tural, Mustafa Kemal</creatorcontrib><creatorcontrib>Ozdemirel, Nur Evin</creatorcontrib><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>ProQuest Central (Corporate)</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Annals of operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dinler, Derya</au><au>Tural, Mustafa Kemal</au><au>Ozdemirel, Nur Evin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</atitle><jtitle>Annals of operations research</jtitle><stitle>Ann Oper Res</stitle><date>2020-06-01</date><risdate>2020</risdate><volume>289</volume><issue>1</issue><spage>85</spage><epage>122</epage><pages>85-122</pages><issn>0254-5330</issn><eissn>1572-9338</eissn><abstract>We consider a clustering problem in which the data objects are rooted
m
-ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10479-019-03505-7</doi><tpages>38</tpages><orcidid>https://orcid.org/0000-0003-3456-8502</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0254-5330 |
ispartof | Annals of operations research, 2020-06, Vol.289 (1), p.85-122 |
issn | 0254-5330 1572-9338 |
language | eng |
recordid | cdi_proquest_journals_2407710643 |
source | ABI/INFORM Global; Access via Business Source (EBSCOhost); Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List |
subjects | Algorithms Brain Business and Management Centroids Clustering Combinatorics Operations research Operations Research/Decision Theory Outliers (statistics) S.I.: OR in Neuroscience II Structured data Theory of Computation Trees |
title | Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T02%3A49%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Centroid%20based%20Tree-Structured%20Data%20Clustering%20Using%20Vertex/Edge%20Overlap%20and%20Graph%20Edit%20Distance&rft.jtitle=Annals%20of%20operations%20research&rft.au=Dinler,%20Derya&rft.date=2020-06-01&rft.volume=289&rft.issue=1&rft.spage=85&rft.epage=122&rft.pages=85-122&rft.issn=0254-5330&rft.eissn=1572-9338&rft_id=info:doi/10.1007/s10479-019-03505-7&rft_dat=%3Cgale_proqu%3EA625269174%3C/gale_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2407710643&rft_id=info:pmid/&rft_galeid=A625269174&rfr_iscdi=true |