Loading…

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of device...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal on selected areas in communications 2019-06, Vol.37 (6), p.1424-1440
Main Authors:	Jiang, Nan, Deng, Yansha, Nallanathan, Arumugam, Chambers, Jonathon A.
Format:	Article
Language:	English
Subjects:	Analytical models Artificial neural networks Cellular communication Configurations cooperative learning Data communication Data transmission Devices Interference Internet of Things Long Term Evolution Machine learning Multiagent systems Narrowband Narrowband Internet of Things Numerical models Optimization Parameters Random access Real time real-time optimization reinforcement learning resource configuration Signal to noise ratio Training Uplink
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23
cites	cdi_FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23
container_end_page	1440
container_issue	6
container_start_page	1424
container_title	IEEE journal on selected areas in communications
container_volume	37
creator	Jiang, Nan Deng, Yansha Nallanathan, Arumugam Chambers, Jonathon A.
description	NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in "how to determine the configuration that maximizes the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion." Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the linear approximation-based Q-learning (LA-Q), and the deep neural network-based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning-based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via actions aggregation (AA-LA-Q and AA-DQN) and via cooperative multi-agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.
doi_str_mv	10.1109/JSAC.2019.2904366
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2226164509</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8664581</ieee_id><sourcerecordid>2226164509</sourcerecordid><originalsourceid>FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHgJeE7d2e8ca_2qlBZqPS-bzUS2NkndpIj-elNavMzA8LzvwEPINdARAM3uXt_GkxGjkI1YRgVX6oQMQEqTUkrNKRlQzXlqNKhzctG2a0pBCMMG5GGJoS6b6LHCuktm6GId6o-kPyVLdJt0FSpMFtsuVOHXdaGpk1An8_t02qySOXbfTfxsL8lZ6TYtXh33kLw_Pa4mL-ls8TydjGepF5R1aSGpE4KD8FpoiZ5zzsBp7hFyU-p-qMzLAlSutHYmQ1awkgmBeS5VUTA-JLeH3m1svnbYdnbd7GLdv7SMMQVKSJr1FBwoH5u2jVjabQyViz8WqN3LsntZdi_LHmX1mZtDJiDiP29U32iA_wGV32Qe</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2226164509</pqid></control><display><type>article</type><title>Reinforcement Learning for Real-Time Optimization in NB-IoT Networks</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Jiang, Nan ; Deng, Yansha ; Nallanathan, Arumugam ; Chambers, Jonathon A.</creator><creatorcontrib>Jiang, Nan ; Deng, Yansha ; Nallanathan, Arumugam ; Chambers, Jonathon A.</creatorcontrib><description>NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in "how to determine the configuration that maximizes the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion." Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the linear approximation-based Q-learning (LA-Q), and the deep neural network-based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning-based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via actions aggregation (AA-LA-Q and AA-DQN) and via cooperative multi-agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.</description><identifier>ISSN: 0733-8716</identifier><identifier>EISSN: 1558-0008</identifier><identifier>DOI: 10.1109/JSAC.2019.2904366</identifier><identifier>CODEN: ISACEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Analytical models ; Artificial neural networks ; Cellular communication ; Configurations ; cooperative learning ; Data communication ; Data transmission ; Devices ; Interference ; Internet of Things ; Long Term Evolution ; Machine learning ; Multiagent systems ; Narrowband ; Narrowband Internet of Things ; Numerical models ; Optimization ; Parameters ; Random access ; Real time ; real-time optimization ; reinforcement learning ; resource configuration ; Signal to noise ratio ; Training ; Uplink</subject><ispartof>IEEE journal on selected areas in communications, 2019-06, Vol.37 (6), p.1424-1440</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23</citedby><cites>FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23</cites><orcidid>0000-0002-5337-890X ; 0000-0003-1001-7036 ; 0000-0001-8337-5884</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8664581$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Jiang, Nan</creatorcontrib><creatorcontrib>Deng, Yansha</creatorcontrib><creatorcontrib>Nallanathan, Arumugam</creatorcontrib><creatorcontrib>Chambers, Jonathon A.</creatorcontrib><title>Reinforcement Learning for Real-Time Optimization in NB-IoT Networks</title><title>IEEE journal on selected areas in communications</title><addtitle>J-SAC</addtitle><description>NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in "how to determine the configuration that maximizes the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion." Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the linear approximation-based Q-learning (LA-Q), and the deep neural network-based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning-based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via actions aggregation (AA-LA-Q and AA-DQN) and via cooperative multi-agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.</description><subject>Analytical models</subject><subject>Artificial neural networks</subject><subject>Cellular communication</subject><subject>Configurations</subject><subject>cooperative learning</subject><subject>Data communication</subject><subject>Data transmission</subject><subject>Devices</subject><subject>Interference</subject><subject>Internet of Things</subject><subject>Long Term Evolution</subject><subject>Machine learning</subject><subject>Multiagent systems</subject><subject>Narrowband</subject><subject>Narrowband Internet of Things</subject><subject>Numerical models</subject><subject>Optimization</subject><subject>Parameters</subject><subject>Random access</subject><subject>Real time</subject><subject>real-time optimization</subject><subject>reinforcement learning</subject><subject>resource configuration</subject><subject>Signal to noise ratio</subject><subject>Training</subject><subject>Uplink</subject><issn>0733-8716</issn><issn>1558-0008</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNo9kE1Lw0AQhhdRsFZ_gHgJeE7d2e8ca_2qlBZqPS-bzUS2NkndpIj-elNavMzA8LzvwEPINdARAM3uXt_GkxGjkI1YRgVX6oQMQEqTUkrNKRlQzXlqNKhzctG2a0pBCMMG5GGJoS6b6LHCuktm6GId6o-kPyVLdJt0FSpMFtsuVOHXdaGpk1An8_t02qySOXbfTfxsL8lZ6TYtXh33kLw_Pa4mL-ls8TydjGepF5R1aSGpE4KD8FpoiZ5zzsBp7hFyU-p-qMzLAlSutHYmQ1awkgmBeS5VUTA-JLeH3m1svnbYdnbd7GLdv7SMMQVKSJr1FBwoH5u2jVjabQyViz8WqN3LsntZdi_LHmX1mZtDJiDiP29U32iA_wGV32Qe</recordid><startdate>20190601</startdate><enddate>20190601</enddate><creator>Jiang, Nan</creator><creator>Deng, Yansha</creator><creator>Nallanathan, Arumugam</creator><creator>Chambers, Jonathon A.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-5337-890X</orcidid><orcidid>https://orcid.org/0000-0003-1001-7036</orcidid><orcidid>https://orcid.org/0000-0001-8337-5884</orcidid></search><sort><creationdate>20190601</creationdate><title>Reinforcement Learning for Real-Time Optimization in NB-IoT Networks</title><author>Jiang, Nan ; Deng, Yansha ; Nallanathan, Arumugam ; Chambers, Jonathon A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Analytical models</topic><topic>Artificial neural networks</topic><topic>Cellular communication</topic><topic>Configurations</topic><topic>cooperative learning</topic><topic>Data communication</topic><topic>Data transmission</topic><topic>Devices</topic><topic>Interference</topic><topic>Internet of Things</topic><topic>Long Term Evolution</topic><topic>Machine learning</topic><topic>Multiagent systems</topic><topic>Narrowband</topic><topic>Narrowband Internet of Things</topic><topic>Numerical models</topic><topic>Optimization</topic><topic>Parameters</topic><topic>Random access</topic><topic>Real time</topic><topic>real-time optimization</topic><topic>reinforcement learning</topic><topic>resource configuration</topic><topic>Signal to noise ratio</topic><topic>Training</topic><topic>Uplink</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Nan</creatorcontrib><creatorcontrib>Deng, Yansha</creatorcontrib><creatorcontrib>Nallanathan, Arumugam</creatorcontrib><creatorcontrib>Chambers, Jonathon A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE journal on selected areas in communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Nan</au><au>Deng, Yansha</au><au>Nallanathan, Arumugam</au><au>Chambers, Jonathon A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning for Real-Time Optimization in NB-IoT Networks</atitle><jtitle>IEEE journal on selected areas in communications</jtitle><stitle>J-SAC</stitle><date>2019-06-01</date><risdate>2019</risdate><volume>37</volume><issue>6</issue><spage>1424</spage><epage>1440</epage><pages>1424-1440</pages><issn>0733-8716</issn><eissn>1558-0008</eissn><coden>ISACEM</coden><abstract>NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in "how to determine the configuration that maximizes the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion." Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the linear approximation-based Q-learning (LA-Q), and the deep neural network-based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning-based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can be good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via actions aggregation (AA-LA-Q and AA-DQN) and via cooperative multi-agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSAC.2019.2904366</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-5337-890X</orcidid><orcidid>https://orcid.org/0000-0003-1001-7036</orcidid><orcidid>https://orcid.org/0000-0001-8337-5884</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0733-8716
ispartof	IEEE journal on selected areas in communications, 2019-06, Vol.37 (6), p.1424-1440
issn	0733-8716 1558-0008
language	eng
recordid	cdi_proquest_journals_2226164509
source	IEEE Electronic Library (IEL) Journals
subjects	Analytical models Artificial neural networks Cellular communication Configurations cooperative learning Data communication Data transmission Devices Interference Internet of Things Long Term Evolution Machine learning Multiagent systems Narrowband Narrowband Internet of Things Numerical models Optimization Parameters Random access Real time real-time optimization reinforcement learning resource configuration Signal to noise ratio Training Uplink
title	Reinforcement Learning for Real-Time Optimization in NB-IoT Networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A28%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning%20for%20Real-Time%20Optimization%20in%20NB-IoT%20Networks&rft.jtitle=IEEE%20journal%20on%20selected%20areas%20in%20communications&rft.au=Jiang,%20Nan&rft.date=2019-06-01&rft.volume=37&rft.issue=6&rft.spage=1424&rft.epage=1440&rft.pages=1424-1440&rft.issn=0733-8716&rft.eissn=1558-0008&rft.coden=ISACEM&rft_id=info:doi/10.1109/JSAC.2019.2904366&rft_dat=%3Cproquest_ieee_%3E2226164509%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c402t-d50a44314c7475ec33321a73ce1b8f71b869c5d16b677a89e2d2f244ebb56dd23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2226164509&rft_id=info:pmid/&rft_ieee_id=8664581&rfr_iscdi=true