5.2.1 Overview

Course subject(s) Module 5. Introduction to Reinforcement Learning

In this subsection we define the agent’s behavior as a policy, learn how that policy is evaluated, and derive the value of the optimal policy. We show how to turn this equation into the Tabular Q-learning algorithm and how to ensure that the agent explores enough.

After this subsection, you should be able to:

define the value of a policy and derive the recursive Bellman equation.
reproduce the optimal Bellman equation.
implement the Q-learning algorithm in a tabular setting.
explain why optimistic initialization ensures exploration.

AI Skills: Introduction to Unsupervised, Deep and Reinforcement Learning by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://online-learning.tudelft.nl/courses/ai-skills-introduction-to-unsupervised-deep-and-reinforcement-learning/