Frontiers in Adaptive Control Part 4

Tham khảo tài liệu 'frontiers in adaptive control part 4', kỹ thuật - công nghệ, cơ khí - chế tạo máy phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | 66 Frontiers in Adaptive Control the observation space and the control set are all finite. If X has k elements then any probability distribution on A can be represented by p1 . pk such that p I and X I X . In this case the value function 27 can be expressed as a convex piecewise linear function of p1 . pk . For more general state spaces X Thrun 2000 has proposed a Monte Carlo procedure called MC-POMDP involving particle filters to perform approximate value iteration in Ì3. MC-POMDP uses a finite particle set to approximate a probability distribution . Specifically this iterative procedure updates the value function V at by simulating for each applicable control u a sample of possible subsequent beliefs - and then averaging over the simulated sample tí Iỉ ĩĩ. Il V tt I . II . 28 so that V is updated by I7 tt max . in the iterative procedure. The basic idea is taken from model-based reinforcement learning Gordon 1995 Kaebling et al. 1996 Sutton Barto 1998 in which function approximations such as neural networks decision trees and spline basis functions are used to represent the value function V in MDPs. To extend the idea to POMDPs the challenge lies in how to represent V since it is a function of a probability distribution on the state space instead of the state itself. Thrun 2000 uses a nearest neighbor approximation to represent V - in 28 . His MC-POMDP algorithm keeps a set database of reference beliefs X and associated values Vi. When a new belief state not in the database is generated its V value is obtained by finding the k nearest neighbors in the database and taking a weighted average of the corresponding Vi values. To measure the distance of Tp from - he convolves each particle with a normal N 0 v distribution having a small variance v so that X and X can be represented by Gaussian mixtures and then uses the Kullback-Leibler divergence to measure the distance divergence di of from X. Denoting the k nearest neighbors of by A X - V 9 in 28 is approximated by

Không thể tạo bản xem trước, hãy bấm tải xuống
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.