Roboss的个人博客分享 http://blog.sciencenet.cn/u/Roboss

博文

Adaptive Dynamic Programming and Reinforcement Learning

已有 4448 次阅读 2011-6-9 13:04 |个人分类:学术讨论|系统分类:科研笔记|关键词:学者

Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control

 

F. L. Lewis, D. Liu, and G. G. Lendaris, "Special issue on adaptive dynamic programming and reinforcement learning in feedback control," IEEE Transactions on Systems, Man and Cybernetics Part B-Cybernetics, vol. 38, no. 4, pp. 896-897, 2008.

 

The importance of adaptive dynamic programming (ADP) to feedback control engineers is that it affords a methodology for learning optimal control actions online in real time based on system performance without necessarily knowing the system dynamics. When such knowledge is required in ADP, it may be of low fidelity.

In feedback control engineering, various types of adaptive controllers provide implementation strategies that employ online observations of system performance to determine regulation controllers that drive the system to the equilibrium state, or tracking controllers that cause the system to follow prescribed trajectories. Certain techniques have been developed for online controller tuning without knowing the system dynamics. However, the control engineer is often constrained in the choice of performance measure or cost employed for the optimization. For example, inverse optimal adaptive controllers exist that optimize some derived performance measures that are reasonable though not of the control engineer’s choosing. Indirect optimal adaptive controllers have been developed that require high-fidelity identification of the system dynamics.

It is becoming more and more clear that ADP techniques, on the other hand, do allow the design of optimal controllers online in real time in terms of a (freely) prescribed performance measure. The key lies effectively in solving Bellman’s optimality condition forward in time through repeated iterations that involve: 1) computing the cost or value of using a current control, then 2) based on that value performing a control policy update, or control improvement. This can be viewed as a type of “structured” reinforcement learning comprising two components, a critic agent and a policy-update agent. The former evaluates currently instantiated control (via procedures called policy iteration or value iteration), and the latter improves controller design based on the latest evaluation. Typically, to allow practical implementation, neural networks are used in these respective agents (fuzzy logic systems are also possible) for value function approximation in the one case and control policy approximation in the other. In the linear-system quadratic-cost function case, the critic neural network is quadratic in the system state, and the neural net weights are exactly the entries of the Riccati equation solution matrix.

ADP has two important roles for the control engineer, specifically as follows. Riccati equation design has shown itself the backbone of modern control systems theory for linear quadratic control, but solution of the corresponding Riccati equation requires full knowledge of the system dynamics. It is also done a priori offline. ADP, on the other hand, allows solution of this Riccati equation online without (full) knowledge of the system dynamics. Arguably more important, ADP extends Riccati equation-like design methods to nonlinear systems by using neural networks, of paradigms that are known to be universal function approximators.

Many of the practitioners in ADP over the years are represented in this special issue, which is broadly divided into three sections: Theoretical Foundations, Theory/Applications, and Applications. We are privileged to have a foreword written by Paul Werbos, the founder of ADP. A lead-in paper by George Lendaris sets into perspective historical, recent, and perhaps future developments in ADP.



https://m.sciencenet.cn/blog-585735-453278.html


下一篇:人工智能与机器学习著名会议

0

该博文允许注册用户评论 请点击登录 评论 (1 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-23 20:36

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部