Loading…
Efficient Realization of Decision Trees for Real-Time Inference
For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a signi...
Saved in:
Published in: | ACM transactions on embedded computing systems 2022-10, Vol.21 (6), p.1-26, Article 68 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3 |
---|---|
cites | cdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3 |
container_end_page | 26 |
container_issue | 6 |
container_start_page | 1 |
container_title | ACM transactions on embedded computing systems |
container_volume | 21 |
creator | Chen, Kuan-Hsun Su, Chiahui Hakert, Christian Buschjäger, Sebastian Lee, Chao-Lin Lee, Jenq-Kuen Morik, Katharina Chen, Jian-Jia |
description | For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively. |
doi_str_mv | 10.1145/3508019 |
format | article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3508019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3508019</sourcerecordid><originalsourceid>FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</originalsourceid><addsrcrecordid>eNo9j01LAzEYhIMoWKt495Sbp-ib5vskUqsWCoKs5yUmbyDS3ZVkL_rr7drqaQbmYZgh5JLDDedS3QoFFrg7IjOulGVCanU8eeGYA2tOyVmtHwDcLKSakbtVSjlk7Ef6in6bv_2Yh54OiT5gyHXyTUGsNA3ll2BN7pCu-4QF-4Dn5CT5bcWLg87J2-OqWT6zzcvTenm_YX5hzMh0cDyCiwJtFDqGoKwI7wajF0kaKwwkEaSC3XTUGni0WkoBGBI3TvEk5uR63xvKUGvB1H6W3Pny1XJop9_t4feOvNqTPnT_0F_4A__QUPw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Efficient Realization of Decision Trees for Real-Time Inference</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Chen, Kuan-Hsun ; Su, Chiahui ; Hakert, Christian ; Buschjäger, Sebastian ; Lee, Chao-Lin ; Lee, Jenq-Kuen ; Morik, Katharina ; Chen, Jian-Jia</creator><creatorcontrib>Chen, Kuan-Hsun ; Su, Chiahui ; Hakert, Christian ; Buschjäger, Sebastian ; Lee, Chao-Lin ; Lee, Jenq-Kuen ; Morik, Katharina ; Chen, Jian-Jia</creatorcontrib><description>For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.</description><identifier>ISSN: 1539-9087</identifier><identifier>EISSN: 1558-3465</identifier><identifier>DOI: 10.1145/3508019</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Classification and regression trees ; Computer systems organization ; Computing methodologies ; Embedded systems ; Software and its engineering ; Software organization and properties</subject><ispartof>ACM transactions on embedded computing systems, 2022-10, Vol.21 (6), p.1-26, Article 68</ispartof><rights>Copyright held by the owner/author(s).</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</citedby><cites>FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</cites><orcidid>0000-0001-9992-9415 ; 0000-0002-7110-921X ; 0000-0001-8114-9760</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Chen, Kuan-Hsun</creatorcontrib><creatorcontrib>Su, Chiahui</creatorcontrib><creatorcontrib>Hakert, Christian</creatorcontrib><creatorcontrib>Buschjäger, Sebastian</creatorcontrib><creatorcontrib>Lee, Chao-Lin</creatorcontrib><creatorcontrib>Lee, Jenq-Kuen</creatorcontrib><creatorcontrib>Morik, Katharina</creatorcontrib><creatorcontrib>Chen, Jian-Jia</creatorcontrib><title>Efficient Realization of Decision Trees for Real-Time Inference</title><title>ACM transactions on embedded computing systems</title><addtitle>ACM TECS</addtitle><description>For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.</description><subject>Classification and regression trees</subject><subject>Computer systems organization</subject><subject>Computing methodologies</subject><subject>Embedded systems</subject><subject>Software and its engineering</subject><subject>Software organization and properties</subject><issn>1539-9087</issn><issn>1558-3465</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9j01LAzEYhIMoWKt495Sbp-ib5vskUqsWCoKs5yUmbyDS3ZVkL_rr7drqaQbmYZgh5JLDDedS3QoFFrg7IjOulGVCanU8eeGYA2tOyVmtHwDcLKSakbtVSjlk7Ef6in6bv_2Yh54OiT5gyHXyTUGsNA3ll2BN7pCu-4QF-4Dn5CT5bcWLg87J2-OqWT6zzcvTenm_YX5hzMh0cDyCiwJtFDqGoKwI7wajF0kaKwwkEaSC3XTUGni0WkoBGBI3TvEk5uR63xvKUGvB1H6W3Pny1XJop9_t4feOvNqTPnT_0F_4A__QUPw</recordid><startdate>20221018</startdate><enddate>20221018</enddate><creator>Chen, Kuan-Hsun</creator><creator>Su, Chiahui</creator><creator>Hakert, Christian</creator><creator>Buschjäger, Sebastian</creator><creator>Lee, Chao-Lin</creator><creator>Lee, Jenq-Kuen</creator><creator>Morik, Katharina</creator><creator>Chen, Jian-Jia</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9992-9415</orcidid><orcidid>https://orcid.org/0000-0002-7110-921X</orcidid><orcidid>https://orcid.org/0000-0001-8114-9760</orcidid></search><sort><creationdate>20221018</creationdate><title>Efficient Realization of Decision Trees for Real-Time Inference</title><author>Chen, Kuan-Hsun ; Su, Chiahui ; Hakert, Christian ; Buschjäger, Sebastian ; Lee, Chao-Lin ; Lee, Jenq-Kuen ; Morik, Katharina ; Chen, Jian-Jia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Classification and regression trees</topic><topic>Computer systems organization</topic><topic>Computing methodologies</topic><topic>Embedded systems</topic><topic>Software and its engineering</topic><topic>Software organization and properties</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Kuan-Hsun</creatorcontrib><creatorcontrib>Su, Chiahui</creatorcontrib><creatorcontrib>Hakert, Christian</creatorcontrib><creatorcontrib>Buschjäger, Sebastian</creatorcontrib><creatorcontrib>Lee, Chao-Lin</creatorcontrib><creatorcontrib>Lee, Jenq-Kuen</creatorcontrib><creatorcontrib>Morik, Katharina</creatorcontrib><creatorcontrib>Chen, Jian-Jia</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on embedded computing systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Kuan-Hsun</au><au>Su, Chiahui</au><au>Hakert, Christian</au><au>Buschjäger, Sebastian</au><au>Lee, Chao-Lin</au><au>Lee, Jenq-Kuen</au><au>Morik, Katharina</au><au>Chen, Jian-Jia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient Realization of Decision Trees for Real-Time Inference</atitle><jtitle>ACM transactions on embedded computing systems</jtitle><stitle>ACM TECS</stitle><date>2022-10-18</date><risdate>2022</risdate><volume>21</volume><issue>6</issue><spage>1</spage><epage>26</epage><pages>1-26</pages><artnum>68</artnum><issn>1539-9087</issn><eissn>1558-3465</eissn><abstract>For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3508019</doi><tpages>26</tpages><orcidid>https://orcid.org/0000-0001-9992-9415</orcidid><orcidid>https://orcid.org/0000-0002-7110-921X</orcidid><orcidid>https://orcid.org/0000-0001-8114-9760</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1539-9087 |
ispartof | ACM transactions on embedded computing systems, 2022-10, Vol.21 (6), p.1-26, Article 68 |
issn | 1539-9087 1558-3465 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3508019 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list) |
subjects | Classification and regression trees Computer systems organization Computing methodologies Embedded systems Software and its engineering Software organization and properties |
title | Efficient Realization of Decision Trees for Real-Time Inference |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T18%3A45%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20Realization%20of%20Decision%20Trees%20for%20Real-Time%20Inference&rft.jtitle=ACM%20transactions%20on%20embedded%20computing%20systems&rft.au=Chen,%20Kuan-Hsun&rft.date=2022-10-18&rft.volume=21&rft.issue=6&rft.spage=1&rft.epage=26&rft.pages=1-26&rft.artnum=68&rft.issn=1539-9087&rft.eissn=1558-3465&rft_id=info:doi/10.1145/3508019&rft_dat=%3Cacm_cross%3E3508019%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a277t-6c91d09d3e8d36dcc583cb7eda3f478370f3c450080e6601d864430ecf17951f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |