5.2.1 Overview

Course subject(s) Module 5. Introduction to Reinforcement Learning

In this subsection we define the agent’s behavior as a policy, learn how that policy is evaluated, and derive the value of the optimal policy. We show how to turn this equation into the Tabular Q-learning algorithm and how to ensure that the agent explores enough.

After this subsection, you should be able to:

  • define the value of a policy and derive the recursive Bellman equation.
  • reproduce the optimal Bellman equation.
  • implement the Q-learning algorithm in a tabular setting.
  • explain why optimistic initialization ensures exploration.
Creative Commons License
AI Skills: Introduction to Unsupervised, Deep and Reinforcement Learning by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://online-learning.tudelft.nl/courses/ai-skills-introduction-to-unsupervised-deep-and-reinforcement-learning/
Back to top