Loading…

The effect of points dispersion on the k-nn search in random projection forests

Partitioning trees are efficient data structures for k -nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called k d-trees to perform k -nn search. Unfortunately, k d-trees can be ineffective in high dimensions because they need more tree levels to...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2022, Vol.10, p.1-1
Main Authors: Alshammari, Mashaan, Stavrakakis, John, Ahmed, Adel F., Takatsuka, Masahiro
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c358t-42cd04ccaab7afc641da4c922217eacce20db41c35790541b2c8f2206421c593
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 10
creator Alshammari, Mashaan
Stavrakakis, John
Ahmed, Adel F.
Takatsuka, Masahiro
description Partitioning trees are efficient data structures for k -nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called k d-trees to perform k -nn search. Unfortunately, k d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k -nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k -nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k -nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.
doi_str_mv 10.1109/ACCESS.2022.3195488
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2700412105</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9846977</ieee_id><doaj_id>oai_doaj_org_article_e6790d76fc364bad88ef38a9acc2baa2</doaj_id><sourcerecordid>2700412105</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-42cd04ccaab7afc641da4c922217eacce20db41c35790541b2c8f2206421c593</originalsourceid><addsrcrecordid>eNpNkVFrwyAUhWVssNL1F_RF2HM6vZpEH0vptkKhD-27GKNrsjZmmj7s388uZUwEL5fvnOvlIDSnZEEpkS_L1Wq93y-AACwYlTkX4g5NgBYyYzkr7v_Vj2gWY0vSEamVlxO0Oxwtts5ZM2DvcO-bboi4bmJvQ2x8h9MdEvKZdR2OVgdzxE2Hg-5qf8Z98G1SXjnng41DfEIPTp-ind3eKTq8rg-r92y7e9usltvMsFwMGQdTE26M1lWpnSk4rTU3EgBoabUxFkhdcZrgUpKc0wqMcACk4EBNLtkUbUbb2utW9aE56_CtvG7Ub8OHD6XD0JiTVbZIFnVZOMMKXulaCOuY0DJNgUprSF7Po1fa5uuSllCtv4Qu_V5BSQinQEmeKDZSJvgYg3V_UylR1xzUmIO65qBuOSTVfFQ11to_hRS8kGXJfgCviIQC</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2700412105</pqid></control><display><type>article</type><title>The effect of points dispersion on the k-nn search in random projection forests</title><source>IEEE Xplore Open Access Journals</source><creator>Alshammari, Mashaan ; Stavrakakis, John ; Ahmed, Adel F. ; Takatsuka, Masahiro</creator><creatorcontrib>Alshammari, Mashaan ; Stavrakakis, John ; Ahmed, Adel F. ; Takatsuka, Masahiro</creatorcontrib><description>Partitioning trees are efficient data structures for k -nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called k d-trees to perform k -nn search. Unfortunately, k d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k -nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k -nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k -nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3195488</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>&lt;italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;k -nearest neighbor search ; Algorithms ; Data structures ; Dispersion ; Forestry ; Libraries ; Licenses ; Machine learning ; Partitioning ; Partitioning algorithms ; Principal component analysis ; Random Projection Forests ; Random Projection Trees ; Searching ; Trees ; unsupervised learning ; Vector quantization</subject><ispartof>IEEE access, 2022, Vol.10, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-42cd04ccaab7afc641da4c922217eacce20db41c35790541b2c8f2206421c593</cites><orcidid>0000-0002-7864-0011 ; 0000-0002-0397-4260</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9846977$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4009,27612,27902,27903,27904,54912</link.rule.ids></links><search><creatorcontrib>Alshammari, Mashaan</creatorcontrib><creatorcontrib>Stavrakakis, John</creatorcontrib><creatorcontrib>Ahmed, Adel F.</creatorcontrib><creatorcontrib>Takatsuka, Masahiro</creatorcontrib><title>The effect of points dispersion on the k-nn search in random projection forests</title><title>IEEE access</title><addtitle>Access</addtitle><description>Partitioning trees are efficient data structures for k -nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called k d-trees to perform k -nn search. Unfortunately, k d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k -nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k -nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k -nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.</description><subject>&lt;italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;k -nearest neighbor search</subject><subject>Algorithms</subject><subject>Data structures</subject><subject>Dispersion</subject><subject>Forestry</subject><subject>Libraries</subject><subject>Licenses</subject><subject>Machine learning</subject><subject>Partitioning</subject><subject>Partitioning algorithms</subject><subject>Principal component analysis</subject><subject>Random Projection Forests</subject><subject>Random Projection Trees</subject><subject>Searching</subject><subject>Trees</subject><subject>unsupervised learning</subject><subject>Vector quantization</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNkVFrwyAUhWVssNL1F_RF2HM6vZpEH0vptkKhD-27GKNrsjZmmj7s388uZUwEL5fvnOvlIDSnZEEpkS_L1Wq93y-AACwYlTkX4g5NgBYyYzkr7v_Vj2gWY0vSEamVlxO0Oxwtts5ZM2DvcO-bboi4bmJvQ2x8h9MdEvKZdR2OVgdzxE2Hg-5qf8Z98G1SXjnng41DfEIPTp-ind3eKTq8rg-r92y7e9usltvMsFwMGQdTE26M1lWpnSk4rTU3EgBoabUxFkhdcZrgUpKc0wqMcACk4EBNLtkUbUbb2utW9aE56_CtvG7Ub8OHD6XD0JiTVbZIFnVZOMMKXulaCOuY0DJNgUprSF7Po1fa5uuSllCtv4Qu_V5BSQinQEmeKDZSJvgYg3V_UylR1xzUmIO65qBuOSTVfFQ11to_hRS8kGXJfgCviIQC</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Alshammari, Mashaan</creator><creator>Stavrakakis, John</creator><creator>Ahmed, Adel F.</creator><creator>Takatsuka, Masahiro</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7864-0011</orcidid><orcidid>https://orcid.org/0000-0002-0397-4260</orcidid></search><sort><creationdate>2022</creationdate><title>The effect of points dispersion on the k-nn search in random projection forests</title><author>Alshammari, Mashaan ; Stavrakakis, John ; Ahmed, Adel F. ; Takatsuka, Masahiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-42cd04ccaab7afc641da4c922217eacce20db41c35790541b2c8f2206421c593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>&lt;italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;k -nearest neighbor search</topic><topic>Algorithms</topic><topic>Data structures</topic><topic>Dispersion</topic><topic>Forestry</topic><topic>Libraries</topic><topic>Licenses</topic><topic>Machine learning</topic><topic>Partitioning</topic><topic>Partitioning algorithms</topic><topic>Principal component analysis</topic><topic>Random Projection Forests</topic><topic>Random Projection Trees</topic><topic>Searching</topic><topic>Trees</topic><topic>unsupervised learning</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alshammari, Mashaan</creatorcontrib><creatorcontrib>Stavrakakis, John</creatorcontrib><creatorcontrib>Ahmed, Adel F.</creatorcontrib><creatorcontrib>Takatsuka, Masahiro</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alshammari, Mashaan</au><au>Stavrakakis, John</au><au>Ahmed, Adel F.</au><au>Takatsuka, Masahiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The effect of points dispersion on the k-nn search in random projection forests</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Partitioning trees are efficient data structures for k -nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called k d-trees to perform k -nn search. Unfortunately, k d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k -nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k -nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k -nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3195488</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7864-0011</orcidid><orcidid>https://orcid.org/0000-0002-0397-4260</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2022, Vol.10, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2700412105
source IEEE Xplore Open Access Journals
subjects <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k -nearest neighbor search
Algorithms
Data structures
Dispersion
Forestry
Libraries
Licenses
Machine learning
Partitioning
Partitioning algorithms
Principal component analysis
Random Projection Forests
Random Projection Trees
Searching
Trees
unsupervised learning
Vector quantization
title The effect of points dispersion on the k-nn search in random projection forests
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T06%3A05%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20effect%20of%20points%20dispersion%20on%20the%20k-nn%20search%20in%20random%20projection%20forests&rft.jtitle=IEEE%20access&rft.au=Alshammari,%20Mashaan&rft.date=2022&rft.volume=10&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3195488&rft_dat=%3Cproquest_ieee_%3E2700412105%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-42cd04ccaab7afc641da4c922217eacce20db41c35790541b2c8f2206421c593%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2700412105&rft_id=info:pmid/&rft_ieee_id=9846977&rfr_iscdi=true