Loading…

Task Learning Over Multi-Day Recording via Internally Rewarded Reinforcement Learning Based Brain Machine Interfaces

Autonomous brain machine interfaces (BMIs) aim to enable paralyzed people to self-evaluate their movement intention to control external devices. Previous reinforcement learning (RL)-based decoders interpret the mapping between neural activity and movements using the external reward for well-trained...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on neural systems and rehabilitation engineering 2020-12, Vol.28 (12), p.3089-3099
Main Authors: Shen, Xiang, Zhang, Xiang, Huang, Yifan, Chen, Shuhang, Wang, Yiwen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Autonomous brain machine interfaces (BMIs) aim to enable paralyzed people to self-evaluate their movement intention to control external devices. Previous reinforcement learning (RL)-based decoders interpret the mapping between neural activity and movements using the external reward for well-trained subjects, and have not investigated the task learning procedure. The brain has developed a learning mechanism to identify the correct actions that lead to rewards in the new task. This internal guidance can be utilized to replace the external reference to advance BMIs as an autonomous system. In this study, we propose to build an internally rewarded reinforcement learning-based BMI framework using the multi-site recording to demonstrate the autonomous learning ability of the BMI decoder on the new task. We test the model on the neural data collected over multiple days while the rats were learning a new lever discrimination task. The primary motor cortex (M1) and medial prefrontal cortex (mPFC) spikes are interpreted by the proposed RL framework into the discrete lever press actions. The neural activity of the mPFC post the action duration is interpreted as the internal reward information, where a support vector machine is implemented to classify the reward vs. non-reward trials with a high accuracy of 87.5% across subjects. This internal reward is used to replace the external water reward to update the decoder, which is able to adapt to the nonstationary neural activity during subject learning. The multi-cortical recording allows us to take in more cortical recordings as input and uses internal critics to guide the decoder learning. Comparing with the classic decoder using M1 activity as the only input and external guidance, the proposed system with multi-cortical recordings shows a better decoding accuracy. More importantly, our internally rewarded decoder demonstrates the autonomous learning ability on the new task as the decoder successfully addresses the time-variant neural patterns while subjects are learning, and works asymptotically as the subjects' behavioral learning progresses. It reveals the potential of endowing BMIs with autonomous task learning ability in the RL framework.
ISSN:1534-4320
1558-0210
DOI:10.1109/TNSRE.2020.3039970