Robotics 2010 Current and future challenges Part 4

Tham khảo tài liệu 'robotics 2010 current and future challenges part 4', kỹ thuật - công nghệ, cơ khí - chế tạo máy phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | 95 Reinforcement learning approach to object-contact motion with estimation of low-dimensional submanifold and mode-boundary The first method is applied to evaluate the effect of introducing the mapping to onedimensional space. The second method is applied to see that the explicit approximation of discontinous reward function can accelerate learning. Simulation Results The obtained mapping is depicted in the left hand of Fig. 6. The bottom circle corresponds to the initial state with z and each circle in the figure denotes a sample. The right hand of Fig. 6. shows the reward profiles obtained through trials. We can see that performance is not always sufficiently good even after many trials. This is caused by the -greedy policy and the nature of the problem when the agent executes random action based on the -greedy policy it can easily fail to maintain contact with the object even after it acquired a sufficiently good policy not to fail. Fig. 6. Obtained 1-D mapping and learning curve obtained by the proposed method The left hand of Fig. 7 shows the state value function v s It can be seen that the result of exploration in the parameterized state space is reflected in the figure where the state value is non-zero. The positive state value means that it was possible to reach the desired configuration through trials. The right hand of Fig. 7 shows the learning result with Q-learning as a comparison. In the Q-learning case the object did not reach the desired goal region within 3 000 trials. With four-dimensional model-based learning it was possible to reach the goal region. Table 2 shows comparisons between the proposed method and the model-based learning method without lower-dimensional mapping. The performances of the obtained controllers after 3 000 trials learning are evaluated without random exploration that is 5 0 with ten test sets. The average performance of the proposed method was higher. This is caused by the fact that the controller obtained by the .

Không thể tạo bản xem trước, hãy bấm tải xuống
TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.