Loading…

Induced states in a decision tree constructed by Q-learning

This paper develops a tree-construction method based on the framework of reinforcement learning (RL). The induction of a decision tree is regarded as a problem of RL, where the optimal policy should be found to obtain the maximal accumulated information gain. The proposed approach consists of two st...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences 2012-12, Vol.213, p.39-49
Main Authors: Hwang, Kao-Shing, Chen, Yu-Jen, Jiang, Wei-Cheng, Yang, Tsung-Wen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper develops a tree-construction method based on the framework of reinforcement learning (RL). The induction of a decision tree is regarded as a problem of RL, where the optimal policy should be found to obtain the maximal accumulated information gain. The proposed approach consists of two stages. At the first stage, the emulation/demonstration stage, sensory-action data in a mechatronic system or samples of training patterns are generated by an operator or stimulator. The records of these emulation data are aggregated into components of the state space represented by a decision tree. State aggregation for decartelization of a state space consists of two phases: split estimation and tree growing. In the split estimation phase, an inducer estimates long-term evaluations of splits at visited nodes. In the second phase, the inducer grows the tree by the predicted long-term evaluations, which is approximated by a neural network model. At the second stage, the learned behavior or classifier is shaped by the RL scheme with a discretized state space constructed by the decision tree derived from the previous stage. Unlike the conventional greedy procedure for constructing and pruning a the tree, the proposed method casts the sequential process of tree induction to policy iterations, where policies for node split are evaluated and improved repeatedly until an optimal or near-optimal policy is obtained. The splitting criterion regarded as an action policy is based on long-term evaluations of payoff instead of using immediate evaluations on impurity. A comparison with CART (classification and regression tree) and C4.5 on several benchmark datasets is presented. Furthermore, to show its applications for learning control, the proposed method is applied further to construct a so-called tree-based reinforcement learning method, where the mechanism works with a discrete state space derived from the proposed method. The results show the feasibility and high performance of the proposed system as a state partition by comparison with the renowned Adaptive Heuristic Critic (AHC) model.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2012.06.009