Loading…

DMQ: Dual-Mode Q-Learning Hardware Accelerator for Shortest Path and Coverage Area

In this paper, we propose a novel dual-mode Q-learning hardware accelerator (DMQ) for shortest path and coverage area problems. The hardware accelerator design uses only an agent to tackle multiple modes, in this case, shortest path and coverage area problems for mobile robots. The work proposes a m...

Full description

Saved in:
Bibliographic Details
Main Authors: Syafalni, Infall, Firdaus, Mohamad Imam, Sutisna, Nana, Adiono, Trio, Juhana, Tutun, Mulyawan, Rahmat
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose a novel dual-mode Q-learning hardware accelerator (DMQ) for shortest path and coverage area problems. The hardware accelerator design uses only an agent to tackle multiple modes, in this case, shortest path and coverage area problems for mobile robots. The work proposes a modified policy generator that supports two reward functions for the shortest path and coverage area. The coverage area mode has 4 \times state spaces than that of the shortest path mode. We also explore some policy generators such as decreasing epsilon and greedy for the dual-mode Q-learning accelerator. Experimental results show that by using a greedy policy generator the learning rate of an agent is faster for both problems. Moreover, the hardware architecture requires only 1199 LUTs, 4 LUTRAMs, and 6 BRAMs for the dual-mode functions. With a throughput of 185.63 MHz, the proposed work outperforms other methods up to 13 \times in energy efficiency. The proposed work is useful for disaster recovery, smart navigation, and other artificial intelligence applications.
ISSN:2164-1706
DOI:10.1109/SOCC62300.2024.10737818