Abstract
In this paper, we propose a new algorithm of an adaptive actor-critic method with multi-step simulated experiences, as a kind of temporal difference (TD) method. In our approach, the TD-error is composed of two value- functions and m utility functions, where m denotes the number of multi-steps in which the experience should be simulated. The value-function is constructed from the critic formulated by a radial basis function neural network (RBFNN), which has a simulated experience as an input, generated from a predictive model based on a kinematic model. Thus, since our approach assumes that the model is available to simulate the m-step experiences and to design a controller, such a kinematic model is also applied to construct the actor and the resultant model based actor (MBA) is also regarded as a network, i.e., it is just viewed as a resolved velocity control network. We implement this approach to control nonholonomic mobile robot, especially in a trajectory tracking control problem for the position coordinates and azimuth. Some simulations show the effectiveness of the proposed method for controlling a mobile robot with two-independent driving wheels.
Original language | English |
---|---|
Pages (from-to) | 81-89 |
Number of pages | 9 |
Journal | Soft Computing |
Volume | 11 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 1 2007 |
Externally published | Yes |
Keywords
- Actor-critic algorithms
- Kinematic model
- Multi-step prediction
- Nonholonomic mobile robot
- Nonlinear predictive model
- Simulated experience
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Geometry and Topology