Loading…

Enhanced decision making in multi-scenarios for autonomous vehicles using alternative bidirectional Q network

To further enhance decision making in autonomous vehicles field, grant more safety, comfort, reduce traffic, and accidents, learning approaches were adopted, mainly reinforcement learning. However, possibility in upgrading these algorithms is still available due to many limitations including : conve...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2022-09, Vol.34 (18), p.15981-15996
Main Authors: Rais, Mohamed Saber, Zouaidia, Khouloud, Boudour, Rachid
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To further enhance decision making in autonomous vehicles field, grant more safety, comfort, reduce traffic, and accidents, learning approaches were adopted, mainly reinforcement learning. However, possibility in upgrading these algorithms is still available due to many limitations including : convergence rate, stability, handling multiple dynamic environments, raw performance, robustness, and complexity of algorithms. To tackle these problems, we propose a novel extension of the well-known deep Q network called “alternative bidirectional Q network” that aims mainly to enhance stability and performance with improving exploration and Q values update policies, to overcome the literature gap that generally focuses on only one policy to handle decision making in multiple scenarios (avoiding obstacles, goal-oriented environments, etc.). In “alternative bidirectional Q network,” data about previous, current, and upcoming states are used to update the Q values where the actions are selected according to the relation between these data to handle several scenarios: highways, merges, roundabouts, and parking. This concept provides reinforcement learning agents with more balance between exploration and exploitation and enhances stability during learning. A gym simulator was adopted for training and testing the proposed algorithm’s outcome, while various state-of-the-art algorithms were used as benchmark models. The performance of the proposed extension was evaluated using several metrics being: loss, accuracy, speed, and reward values, where the results of comparison showed the superiority of the novel extension in all scenarios for most of the exploited metrics. The experiment results were confirmed using the complexity and the robustness aspects.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-022-07278-2