Loading…

Efficient Realization of Decision Trees for Real-Time Inference

For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a signi...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on embedded computing systems 2022-10, Vol.21 (6), p.1-26, Article 68
Main Authors: Chen, Kuan-Hsun, Su, Chiahui, Hakert, Christian, Buschjäger, Sebastian, Lee, Chao-Lin, Lee, Jenq-Kuen, Morik, Katharina, Chen, Jian-Jia
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3
cites cdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3
container_end_page 26
container_issue 6
container_start_page 1
container_title ACM transactions on embedded computing systems
container_volume 21
creator Chen, Kuan-Hsun
Su, Chiahui
Hakert, Christian
Buschjäger, Sebastian
Lee, Chao-Lin
Lee, Jenq-Kuen
Morik, Katharina
Chen, Jian-Jia
description For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.
doi_str_mv 10.1145/3508019
format article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3508019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3508019</sourcerecordid><originalsourceid>FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</originalsourceid><addsrcrecordid>eNo9j01LAzEYhIMoWKt495Sbp-ib5vskUqsWCoKs5yUmbyDS3ZVkL_rr7drqaQbmYZgh5JLDDedS3QoFFrg7IjOulGVCanU8eeGYA2tOyVmtHwDcLKSakbtVSjlk7Ef6in6bv_2Yh54OiT5gyHXyTUGsNA3ll2BN7pCu-4QF-4Dn5CT5bcWLg87J2-OqWT6zzcvTenm_YX5hzMh0cDyCiwJtFDqGoKwI7wajF0kaKwwkEaSC3XTUGni0WkoBGBI3TvEk5uR63xvKUGvB1H6W3Pny1XJop9_t4feOvNqTPnT_0F_4A__QUPw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Efficient Realization of Decision Trees for Real-Time Inference</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Chen, Kuan-Hsun ; Su, Chiahui ; Hakert, Christian ; Buschjäger, Sebastian ; Lee, Chao-Lin ; Lee, Jenq-Kuen ; Morik, Katharina ; Chen, Jian-Jia</creator><creatorcontrib>Chen, Kuan-Hsun ; Su, Chiahui ; Hakert, Christian ; Buschjäger, Sebastian ; Lee, Chao-Lin ; Lee, Jenq-Kuen ; Morik, Katharina ; Chen, Jian-Jia</creatorcontrib><description>For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.</description><identifier>ISSN: 1539-9087</identifier><identifier>EISSN: 1558-3465</identifier><identifier>DOI: 10.1145/3508019</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Classification and regression trees ; Computer systems organization ; Computing methodologies ; Embedded systems ; Software and its engineering ; Software organization and properties</subject><ispartof>ACM transactions on embedded computing systems, 2022-10, Vol.21 (6), p.1-26, Article 68</ispartof><rights>Copyright held by the owner/author(s).</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</citedby><cites>FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</cites><orcidid>0000-0001-9992-9415 ; 0000-0002-7110-921X ; 0000-0001-8114-9760</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Chen, Kuan-Hsun</creatorcontrib><creatorcontrib>Su, Chiahui</creatorcontrib><creatorcontrib>Hakert, Christian</creatorcontrib><creatorcontrib>Buschjäger, Sebastian</creatorcontrib><creatorcontrib>Lee, Chao-Lin</creatorcontrib><creatorcontrib>Lee, Jenq-Kuen</creatorcontrib><creatorcontrib>Morik, Katharina</creatorcontrib><creatorcontrib>Chen, Jian-Jia</creatorcontrib><title>Efficient Realization of Decision Trees for Real-Time Inference</title><title>ACM transactions on embedded computing systems</title><addtitle>ACM TECS</addtitle><description>For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.</description><subject>Classification and regression trees</subject><subject>Computer systems organization</subject><subject>Computing methodologies</subject><subject>Embedded systems</subject><subject>Software and its engineering</subject><subject>Software organization and properties</subject><issn>1539-9087</issn><issn>1558-3465</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9j01LAzEYhIMoWKt495Sbp-ib5vskUqsWCoKs5yUmbyDS3ZVkL_rr7drqaQbmYZgh5JLDDedS3QoFFrg7IjOulGVCanU8eeGYA2tOyVmtHwDcLKSakbtVSjlk7Ef6in6bv_2Yh54OiT5gyHXyTUGsNA3ll2BN7pCu-4QF-4Dn5CT5bcWLg87J2-OqWT6zzcvTenm_YX5hzMh0cDyCiwJtFDqGoKwI7wajF0kaKwwkEaSC3XTUGni0WkoBGBI3TvEk5uR63xvKUGvB1H6W3Pny1XJop9_t4feOvNqTPnT_0F_4A__QUPw</recordid><startdate>20221018</startdate><enddate>20221018</enddate><creator>Chen, Kuan-Hsun</creator><creator>Su, Chiahui</creator><creator>Hakert, Christian</creator><creator>Buschjäger, Sebastian</creator><creator>Lee, Chao-Lin</creator><creator>Lee, Jenq-Kuen</creator><creator>Morik, Katharina</creator><creator>Chen, Jian-Jia</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9992-9415</orcidid><orcidid>https://orcid.org/0000-0002-7110-921X</orcidid><orcidid>https://orcid.org/0000-0001-8114-9760</orcidid></search><sort><creationdate>20221018</creationdate><title>Efficient Realization of Decision Trees for Real-Time Inference</title><author>Chen, Kuan-Hsun ; Su, Chiahui ; Hakert, Christian ; Buschjäger, Sebastian ; Lee, Chao-Lin ; Lee, Jenq-Kuen ; Morik, Katharina ; Chen, Jian-Jia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Classification and regression trees</topic><topic>Computer systems organization</topic><topic>Computing methodologies</topic><topic>Embedded systems</topic><topic>Software and its engineering</topic><topic>Software organization and properties</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Kuan-Hsun</creatorcontrib><creatorcontrib>Su, Chiahui</creatorcontrib><creatorcontrib>Hakert, Christian</creatorcontrib><creatorcontrib>Buschjäger, Sebastian</creatorcontrib><creatorcontrib>Lee, Chao-Lin</creatorcontrib><creatorcontrib>Lee, Jenq-Kuen</creatorcontrib><creatorcontrib>Morik, Katharina</creatorcontrib><creatorcontrib>Chen, Jian-Jia</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on embedded computing systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Kuan-Hsun</au><au>Su, Chiahui</au><au>Hakert, Christian</au><au>Buschjäger, Sebastian</au><au>Lee, Chao-Lin</au><au>Lee, Jenq-Kuen</au><au>Morik, Katharina</au><au>Chen, Jian-Jia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Realization of Decision Trees for Real-Time Inference</atitle><jtitle>ACM transactions on embedded computing systems</jtitle><stitle>ACM TECS</stitle><date>2022-10-18</date><risdate>2022</risdate><volume>21</volume><issue>6</issue><spage>1</spage><epage>26</epage><pages>1-26</pages><artnum>68</artnum><issn>1539-9087</issn><eissn>1558-3465</eissn><abstract>For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3508019</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0001-9992-9415</orcidid><orcidid>https://orcid.org/0000-0002-7110-921X</orcidid><orcidid>https://orcid.org/0000-0001-8114-9760</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1539-9087
ispartof ACM transactions on embedded computing systems, 2022-10, Vol.21 (6), p.1-26, Article 68
issn 1539-9087
1558-3465
language eng
recordid cdi_crossref_primary_10_1145_3508019
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Classification and regression trees
Computer systems organization
Computing methodologies
Embedded systems
Software and its engineering
Software organization and properties
title Efficient Realization of Decision Trees for Real-Time Inference
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T18%3A45%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Realization%20of%20Decision%20Trees%20for%20Real-Time%20Inference&rft.jtitle=ACM%20transactions%20on%20embedded%20computing%20systems&rft.au=Chen,%20Kuan-Hsun&rft.date=2022-10-18&rft.volume=21&rft.issue=6&rft.spage=1&rft.epage=26&rft.pages=1-26&rft.artnum=68&rft.issn=1539-9087&rft.eissn=1558-3465&rft_id=info:doi/10.1145/3508019&rft_dat=%3Cacm_cross%3E3508019%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true