Loading…

Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance

We consider a clustering problem in which the data objects are rooted m -ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VE...

Full description

Saved in:
Bibliographic Details
Published in:Annals of operations research 2020-06, Vol.289 (1), p.85-122
Main Authors: Dinler, Derya, Tural, Mustafa Kemal, Ozdemirel, Nur Evin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33
cites cdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33
container_end_page 122
container_issue 1
container_start_page 85
container_title Annals of operations research
container_volume 289
creator Dinler, Derya
Tural, Mustafa Kemal
Ozdemirel, Nur Evin
description We consider a clustering problem in which the data objects are rooted m -ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.
doi_str_mv 10.1007/s10479-019-03505-7
format article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2407710643</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A625269174</galeid><sourcerecordid>A625269174</sourcerecordid><originalsourceid>FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</originalsourceid><addsrcrecordid>eNp9kU1r3DAQhkVoIdu0f6AnQ691Mvqy7GPYbNJCIIcmvapjeewo7NquJJfm31fbDSSBUoRGSDzPCOll7COHUw5gziIHZZoSeJ5Sgy7NEVtxbUTZSFm_YSsQWpVaSjhm72J8AADOa71iP9Y0pjD5rmgxUlfcBqLyWwqLS0vI-wtMWKy3S0wU_DgUd3Ffv1NI9Pts0w1U3PyisMW5wLErrgLO98Wm86m48DHh6Og9e9vjNtKHp_WE3V1ubtdfyuubq6_r8-vSKSFTqWQFJLkCVMixdpWusOucarhQrVCSa4DW1ChMi7LWEnoylaNGcDSyb6U8YZ8Ofecw_VwoJvswLWHMV1qhwBgOlXpBDbgl68d-SgHdzkdnzyuhRdVwozJ1-g8qj4523k0j9T6fvxI-vxDaJX8SxVyiH-5THHCJ8TUuDrgLU4yBejsHv8PwaDnYfZ72kKfNedq_eVqTJXmQ4rxPgsLzA_9j_QEJK5_w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2407710643</pqid></control><display><type>article</type><title>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</title><source>ABI/INFORM Global</source><source>Access via Business Source (EBSCOhost)</source><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Dinler, Derya ; Tural, Mustafa Kemal ; Ozdemirel, Nur Evin</creator><creatorcontrib>Dinler, Derya ; Tural, Mustafa Kemal ; Ozdemirel, Nur Evin</creatorcontrib><description>We consider a clustering problem in which the data objects are rooted m -ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.</description><identifier>ISSN: 0254-5330</identifier><identifier>EISSN: 1572-9338</identifier><identifier>DOI: 10.1007/s10479-019-03505-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Brain ; Business and Management ; Centroids ; Clustering ; Combinatorics ; Operations research ; Operations Research/Decision Theory ; Outliers (statistics) ; S.I.: OR in Neuroscience II ; Structured data ; Theory of Computation ; Trees</subject><ispartof>Annals of operations research, 2020-06, Vol.289 (1), p.85-122</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>COPYRIGHT 2020 Springer</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</citedby><cites>FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</cites><orcidid>0000-0003-3456-8502</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2407710643/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2407710643?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>Dinler, Derya</creatorcontrib><creatorcontrib>Tural, Mustafa Kemal</creatorcontrib><creatorcontrib>Ozdemirel, Nur Evin</creatorcontrib><title>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</title><title>Annals of operations research</title><addtitle>Ann Oper Res</addtitle><description>We consider a clustering problem in which the data objects are rooted m -ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.</description><subject>Algorithms</subject><subject>Brain</subject><subject>Business and Management</subject><subject>Centroids</subject><subject>Clustering</subject><subject>Combinatorics</subject><subject>Operations research</subject><subject>Operations Research/Decision Theory</subject><subject>Outliers (statistics)</subject><subject>S.I.: OR in Neuroscience II</subject><subject>Structured data</subject><subject>Theory of Computation</subject><subject>Trees</subject><issn>0254-5330</issn><issn>1572-9338</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kU1r3DAQhkVoIdu0f6AnQ691Mvqy7GPYbNJCIIcmvapjeewo7NquJJfm31fbDSSBUoRGSDzPCOll7COHUw5gziIHZZoSeJ5Sgy7NEVtxbUTZSFm_YSsQWpVaSjhm72J8AADOa71iP9Y0pjD5rmgxUlfcBqLyWwqLS0vI-wtMWKy3S0wU_DgUd3Ffv1NI9Pts0w1U3PyisMW5wLErrgLO98Wm86m48DHh6Og9e9vjNtKHp_WE3V1ubtdfyuubq6_r8-vSKSFTqWQFJLkCVMixdpWusOucarhQrVCSa4DW1ChMi7LWEnoylaNGcDSyb6U8YZ8Ofecw_VwoJvswLWHMV1qhwBgOlXpBDbgl68d-SgHdzkdnzyuhRdVwozJ1-g8qj4523k0j9T6fvxI-vxDaJX8SxVyiH-5THHCJ8TUuDrgLU4yBejsHv8PwaDnYfZ72kKfNedq_eVqTJXmQ4rxPgsLzA_9j_QEJK5_w</recordid><startdate>20200601</startdate><enddate>20200601</enddate><creator>Dinler, Derya</creator><creator>Tural, Mustafa Kemal</creator><creator>Ozdemirel, Nur Evin</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>3V.</scope><scope>7TA</scope><scope>7TB</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>KR7</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-3456-8502</orcidid></search><sort><creationdate>20200601</creationdate><title>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</title><author>Dinler, Derya ; Tural, Mustafa Kemal ; Ozdemirel, Nur Evin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Brain</topic><topic>Business and Management</topic><topic>Centroids</topic><topic>Clustering</topic><topic>Combinatorics</topic><topic>Operations research</topic><topic>Operations Research/Decision Theory</topic><topic>Outliers (statistics)</topic><topic>S.I.: OR in Neuroscience II</topic><topic>Structured data</topic><topic>Theory of Computation</topic><topic>Trees</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dinler, Derya</creatorcontrib><creatorcontrib>Tural, Mustafa Kemal</creatorcontrib><creatorcontrib>Ozdemirel, Nur Evin</creatorcontrib><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>ProQuest Central (Corporate)</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Annals of operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dinler, Derya</au><au>Tural, Mustafa Kemal</au><au>Ozdemirel, Nur Evin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance</atitle><jtitle>Annals of operations research</jtitle><stitle>Ann Oper Res</stitle><date>2020-06-01</date><risdate>2020</risdate><volume>289</volume><issue>1</issue><spage>85</spage><epage>122</epage><pages>85-122</pages><issn>0254-5330</issn><eissn>1572-9338</eissn><abstract>We consider a clustering problem in which the data objects are rooted m -ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10479-019-03505-7</doi><tpages>38</tpages><orcidid>https://orcid.org/0000-0003-3456-8502</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0254-5330
ispartof Annals of operations research, 2020-06, Vol.289 (1), p.85-122
issn 0254-5330
1572-9338
language eng
recordid cdi_proquest_journals_2407710643
source ABI/INFORM Global; Access via Business Source (EBSCOhost); Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List
subjects Algorithms
Brain
Business and Management
Centroids
Clustering
Combinatorics
Operations research
Operations Research/Decision Theory
Outliers (statistics)
S.I.: OR in Neuroscience II
Structured data
Theory of Computation
Trees
title Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T02%3A49%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Centroid%20based%20Tree-Structured%20Data%20Clustering%20Using%20Vertex/Edge%20Overlap%20and%20Graph%20Edit%20Distance&rft.jtitle=Annals%20of%20operations%20research&rft.au=Dinler,%20Derya&rft.date=2020-06-01&rft.volume=289&rft.issue=1&rft.spage=85&rft.epage=122&rft.pages=85-122&rft.issn=0254-5330&rft.eissn=1572-9338&rft_id=info:doi/10.1007/s10479-019-03505-7&rft_dat=%3Cgale_proqu%3EA625269174%3C/gale_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c423t-4360e3140a4a1a8c656addc49124b2431500b78a27ba38530fe76ce921a73fb33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2407710643&rft_id=info:pmid/&rft_galeid=A625269174&rfr_iscdi=true