5.2.1 Overview
Course subject(s)
Module 5. Introduction to Reinforcement Learning
In this subsection we define the agent’s behavior as a policy, learn how that policy is evaluated, and derive the value of the optimal policy. We show how to turn this equation into the Tabular Q-learning algorithm and how to ensure that the agent explores enough.
After this subsection, you should be able to:
- define the value of a policy and derive the recursive Bellman equation.
- reproduce the optimal Bellman equation.
- implement the Q-learning algorithm in a tabular setting.
- explain why optimistic initialization ensures exploration.
AI Skills: Introduction to Unsupervised, Deep and Reinforcement Learning by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://online-learning.tudelft.nl/courses/ai-skills-introduction-to-unsupervised-deep-and-reinforcement-learning/