Loading…

Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems

We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a nonideal reconfigurable intelligent surface (RIS)-aided ultrareliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by le...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE internet of things journal 2023-05, Vol.10 (10), p.8931-8943
Main Authors:	Hashemi, Ramin, Ali, Samad, Mahmood, Nurul Huda, Latva-Aho, Matti
Format:	Article
Language:	English
Subjects:	Actuators Algorithms Amplitudes Array signal processing Automation Beamforming Block error probability (BLER) Communications systems Deep learning deep reinforcement learning (DRL) finite blocklength (FBL) industrial automation Machine learning Network latency Nonlinear response Optimization reconfigurable intelligent surface (RIS) Resource management Ultra reliable low latency communication ultrareliable low-latency communications (URLLC) Wireless communication
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213
cites	cdi_FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213
container_end_page	8943
container_issue	10
container_start_page	8931
container_title	IEEE internet of things journal
container_volume	10
creator	Hashemi, Ramin Ali, Samad Mahmood, Nurul Huda Latva-Aho, Matti
description	We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a nonideal reconfigurable intelligent surface (RIS)-aided ultrareliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by leveraging a deep reinforcement learning (DRL) algorithm named twin-delayed deep deterministic policy gradient (TD3). First, assuming an industrial automation system, the signal-to-interference-plus-noise ratio and achievable rate in the FBL regime are identified for each actuator. Next, the joint active/passive beamforming and CBL optimization problem (OP) is formulated where the objective is to maximize the total achievable FBL rate in all actuators, subject to nonlinear amplitude response at the RIS elements, BS transmit power budget, and total available CBL. Since the formulated problem is highly nonconvex and nonlinear, we resort to employing an actor-critic policy gradient DRL algorithm based on TD3. The considered method relies on interacting RIS with the industrial automation environment by taking actions which are the phase shifts at the RIS elements, CBL variables, and BS beamforming to maximize the expected observed reward, i.e., the total FBL rate. We assess the performance loss of the system when the RIS is nonideal, i.e., with nonlinear amplitude response, and compare it with ideal RIS without impairments. The numerical results show that optimizing the RIS phase shifts, BS beamforming, and CBL variables via the TD3 method with deterministic policy outperforms conventional methods and it is highly beneficial for improving the network total FBL rate considering finite CBL size.
doi_str_mv	10.1109/JIOT.2022.3232962
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2809894134</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10002901</ieee_id><sourcerecordid>2809894134</sourcerecordid><originalsourceid>FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213</originalsourceid><addsrcrecordid>eNpNkMtuwjAQRa2qlYooH1CpC0tdh_oRnHiJ6CtVqiACa8s442JEEmqHBf36BsGC1VyNzp2RDkKPlIwpJfLlKyuWY0YYG3PGmRTsBg36kESxEOz2Kt-jUQhbQkhfm1ApBmj9CrDHC3CNbb2BGpoO56B945of3K_w3GvTOaN3eL7RAaJy42yHi33navenO9c22DV4kZXR1FVQ4e-sLPBqkeczXB5DB3V4QHdW7wKMLnOIVu9vy9lnlBcf2WyaR4bzpIs0rI1liayoYGB4PEmE0SlYQ1JKLUtBVzoR0sYJFZQIbRgn3Ka052M6YZQP0fP57t63vwcIndq2B9_0LxVLiUxlTHncU_RMGd-G4MGqvXe19kdFiTrZVCeb6mRTXWz2nadzxwHAFU8Ik4Tyf_Ctbvc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2809894134</pqid></control><display><type>article</type><title>Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Hashemi, Ramin ; Ali, Samad ; Mahmood, Nurul Huda ; Latva-Aho, Matti</creator><creatorcontrib>Hashemi, Ramin ; Ali, Samad ; Mahmood, Nurul Huda ; Latva-Aho, Matti</creatorcontrib><description>We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a nonideal reconfigurable intelligent surface (RIS)-aided ultrareliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by leveraging a deep reinforcement learning (DRL) algorithm named twin-delayed deep deterministic policy gradient (TD3). First, assuming an industrial automation system, the signal-to-interference-plus-noise ratio and achievable rate in the FBL regime are identified for each actuator. Next, the joint active/passive beamforming and CBL optimization problem (OP) is formulated where the objective is to maximize the total achievable FBL rate in all actuators, subject to nonlinear amplitude response at the RIS elements, BS transmit power budget, and total available CBL. Since the formulated problem is highly nonconvex and nonlinear, we resort to employing an actor-critic policy gradient DRL algorithm based on TD3. The considered method relies on interacting RIS with the industrial automation environment by taking actions which are the phase shifts at the RIS elements, CBL variables, and BS beamforming to maximize the expected observed reward, i.e., the total FBL rate. We assess the performance loss of the system when the RIS is nonideal, i.e., with nonlinear amplitude response, and compare it with ideal RIS without impairments. The numerical results show that optimizing the RIS phase shifts, BS beamforming, and CBL variables via the TD3 method with deterministic policy outperforms conventional methods and it is highly beneficial for improving the network total FBL rate considering finite CBL size.</description><identifier>ISSN: 2327-4662</identifier><identifier>EISSN: 2327-4662</identifier><identifier>DOI: 10.1109/JIOT.2022.3232962</identifier><identifier>CODEN: IITJAU</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Actuators ; Algorithms ; Amplitudes ; Array signal processing ; Automation ; Beamforming ; Block error probability (BLER) ; Communications systems ; Deep learning ; deep reinforcement learning (DRL) ; finite blocklength (FBL) ; industrial automation ; Machine learning ; Network latency ; Nonlinear response ; Optimization ; reconfigurable intelligent surface (RIS) ; Resource management ; Ultra reliable low latency communication ; ultrareliable low-latency communications (URLLC) ; Wireless communication</subject><ispartof>IEEE internet of things journal, 2023-05, Vol.10 (10), p.8931-8943</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213</citedby><cites>FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213</cites><orcidid>0000-0002-2754-4871 ; 0000-0002-1171-8435 ; 0000-0002-1478-2272 ; 0000-0002-6261-0969</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10002901$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Hashemi, Ramin</creatorcontrib><creatorcontrib>Ali, Samad</creatorcontrib><creatorcontrib>Mahmood, Nurul Huda</creatorcontrib><creatorcontrib>Latva-Aho, Matti</creatorcontrib><title>Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems</title><title>IEEE internet of things journal</title><addtitle>JIoT</addtitle><description>We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a nonideal reconfigurable intelligent surface (RIS)-aided ultrareliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by leveraging a deep reinforcement learning (DRL) algorithm named twin-delayed deep deterministic policy gradient (TD3). First, assuming an industrial automation system, the signal-to-interference-plus-noise ratio and achievable rate in the FBL regime are identified for each actuator. Next, the joint active/passive beamforming and CBL optimization problem (OP) is formulated where the objective is to maximize the total achievable FBL rate in all actuators, subject to nonlinear amplitude response at the RIS elements, BS transmit power budget, and total available CBL. Since the formulated problem is highly nonconvex and nonlinear, we resort to employing an actor-critic policy gradient DRL algorithm based on TD3. The considered method relies on interacting RIS with the industrial automation environment by taking actions which are the phase shifts at the RIS elements, CBL variables, and BS beamforming to maximize the expected observed reward, i.e., the total FBL rate. We assess the performance loss of the system when the RIS is nonideal, i.e., with nonlinear amplitude response, and compare it with ideal RIS without impairments. The numerical results show that optimizing the RIS phase shifts, BS beamforming, and CBL variables via the TD3 method with deterministic policy outperforms conventional methods and it is highly beneficial for improving the network total FBL rate considering finite CBL size.</description><subject>Actuators</subject><subject>Algorithms</subject><subject>Amplitudes</subject><subject>Array signal processing</subject><subject>Automation</subject><subject>Beamforming</subject><subject>Block error probability (BLER)</subject><subject>Communications systems</subject><subject>Deep learning</subject><subject>deep reinforcement learning (DRL)</subject><subject>finite blocklength (FBL)</subject><subject>industrial automation</subject><subject>Machine learning</subject><subject>Network latency</subject><subject>Nonlinear response</subject><subject>Optimization</subject><subject>reconfigurable intelligent surface (RIS)</subject><subject>Resource management</subject><subject>Ultra reliable low latency communication</subject><subject>ultrareliable low-latency communications (URLLC)</subject><subject>Wireless communication</subject><issn>2327-4662</issn><issn>2327-4662</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNpNkMtuwjAQRa2qlYooH1CpC0tdh_oRnHiJ6CtVqiACa8s442JEEmqHBf36BsGC1VyNzp2RDkKPlIwpJfLlKyuWY0YYG3PGmRTsBg36kESxEOz2Kt-jUQhbQkhfm1ApBmj9CrDHC3CNbb2BGpoO56B945of3K_w3GvTOaN3eL7RAaJy42yHi33navenO9c22DV4kZXR1FVQ4e-sLPBqkeczXB5DB3V4QHdW7wKMLnOIVu9vy9lnlBcf2WyaR4bzpIs0rI1liayoYGB4PEmE0SlYQ1JKLUtBVzoR0sYJFZQIbRgn3Ka052M6YZQP0fP57t63vwcIndq2B9_0LxVLiUxlTHncU_RMGd-G4MGqvXe19kdFiTrZVCeb6mRTXWz2nadzxwHAFU8Ik4Tyf_Ctbvc</recordid><startdate>20230515</startdate><enddate>20230515</enddate><creator>Hashemi, Ramin</creator><creator>Ali, Samad</creator><creator>Mahmood, Nurul Huda</creator><creator>Latva-Aho, Matti</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2754-4871</orcidid><orcidid>https://orcid.org/0000-0002-1171-8435</orcidid><orcidid>https://orcid.org/0000-0002-1478-2272</orcidid><orcidid>https://orcid.org/0000-0002-6261-0969</orcidid></search><sort><creationdate>20230515</creationdate><title>Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems</title><author>Hashemi, Ramin ; Ali, Samad ; Mahmood, Nurul Huda ; Latva-Aho, Matti</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Actuators</topic><topic>Algorithms</topic><topic>Amplitudes</topic><topic>Array signal processing</topic><topic>Automation</topic><topic>Beamforming</topic><topic>Block error probability (BLER)</topic><topic>Communications systems</topic><topic>Deep learning</topic><topic>deep reinforcement learning (DRL)</topic><topic>finite blocklength (FBL)</topic><topic>industrial automation</topic><topic>Machine learning</topic><topic>Network latency</topic><topic>Nonlinear response</topic><topic>Optimization</topic><topic>reconfigurable intelligent surface (RIS)</topic><topic>Resource management</topic><topic>Ultra reliable low latency communication</topic><topic>ultrareliable low-latency communications (URLLC)</topic><topic>Wireless communication</topic><toplevel>online_resources</toplevel><creatorcontrib>Hashemi, Ramin</creatorcontrib><creatorcontrib>Ali, Samad</creatorcontrib><creatorcontrib>Mahmood, Nurul Huda</creatorcontrib><creatorcontrib>Latva-Aho, Matti</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE internet of things journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hashemi, Ramin</au><au>Ali, Samad</au><au>Mahmood, Nurul Huda</au><au>Latva-Aho, Matti</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems</atitle><jtitle>IEEE internet of things journal</jtitle><stitle>JIoT</stitle><date>2023-05-15</date><risdate>2023</risdate><volume>10</volume><issue>10</issue><spage>8931</spage><epage>8943</epage><pages>8931-8943</pages><issn>2327-4662</issn><eissn>2327-4662</eissn><coden>IITJAU</coden><abstract>We study the joint active/passive beamforming and channel blocklength (CBL) allocation in a nonideal reconfigurable intelligent surface (RIS)-aided ultrareliable and low-latency communication (URLLC) system. The considered scenario is a finite blocklength (FBL) regime and the problem is solved by leveraging a deep reinforcement learning (DRL) algorithm named twin-delayed deep deterministic policy gradient (TD3). First, assuming an industrial automation system, the signal-to-interference-plus-noise ratio and achievable rate in the FBL regime are identified for each actuator. Next, the joint active/passive beamforming and CBL optimization problem (OP) is formulated where the objective is to maximize the total achievable FBL rate in all actuators, subject to nonlinear amplitude response at the RIS elements, BS transmit power budget, and total available CBL. Since the formulated problem is highly nonconvex and nonlinear, we resort to employing an actor-critic policy gradient DRL algorithm based on TD3. The considered method relies on interacting RIS with the industrial automation environment by taking actions which are the phase shifts at the RIS elements, CBL variables, and BS beamforming to maximize the expected observed reward, i.e., the total FBL rate. We assess the performance loss of the system when the RIS is nonideal, i.e., with nonlinear amplitude response, and compare it with ideal RIS without impairments. The numerical results show that optimizing the RIS phase shifts, BS beamforming, and CBL variables via the TD3 method with deterministic policy outperforms conventional methods and it is highly beneficial for improving the network total FBL rate considering finite CBL size.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/JIOT.2022.3232962</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-2754-4871</orcidid><orcidid>https://orcid.org/0000-0002-1171-8435</orcidid><orcidid>https://orcid.org/0000-0002-1478-2272</orcidid><orcidid>https://orcid.org/0000-0002-6261-0969</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2327-4662
ispartof	IEEE internet of things journal, 2023-05, Vol.10 (10), p.8931-8943
issn	2327-4662 2327-4662
language	eng
recordid	cdi_proquest_journals_2809894134
source	IEEE Electronic Library (IEL) Journals
subjects	Actuators Algorithms Amplitudes Array signal processing Automation Beamforming Block error probability (BLER) Communications systems Deep learning deep reinforcement learning (DRL) finite blocklength (FBL) industrial automation Machine learning Network latency Nonlinear response Optimization reconfigurable intelligent surface (RIS) Resource management Ultra reliable low latency communication ultrareliable low-latency communications (URLLC) Wireless communication
title	Deep Reinforcement Learning for Practical Phase-Shift Optimization in RIS-Aided MISO URLLC Systems
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T14%3A21%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Reinforcement%20Learning%20for%20Practical%20Phase-Shift%20Optimization%20in%20RIS-Aided%20MISO%20URLLC%20Systems&rft.jtitle=IEEE%20internet%20of%20things%20journal&rft.au=Hashemi,%20Ramin&rft.date=2023-05-15&rft.volume=10&rft.issue=10&rft.spage=8931&rft.epage=8943&rft.pages=8931-8943&rft.issn=2327-4662&rft.eissn=2327-4662&rft.coden=IITJAU&rft_id=info:doi/10.1109/JIOT.2022.3232962&rft_dat=%3Cproquest_cross%3E2809894134%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c337t-aebcf279d162ec34576ca8efc0811f28eada769f4716106ac2303f81d16415213%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2809894134&rft_id=info:pmid/&rft_ieee_id=10002901&rfr_iscdi=true