html Reinforcement Learning The Game: Level 1

Welcome to Reinforcement Learning The Game: Level 1

Objective: Learn to navigate the treacherous world of Markov chains.

You are a novice agent tasked with optimizing the behavior of a complex system. The system is a simple Markov chain with 5 states. Your goal is to find the optimal policy that maximizes the reward.

The system has a transition matrix:

A B C D E
0.9 0.1 0.5 0.3 0.2
0.5 0.8 0.2 0.4 0.6
0.7 0.3 0.9 0.8 0.1

Reward function: R(state) = state

You start at state A. The game is episodic, with each episode consisting of 10 steps. Your policy is to select the next state based on the transition matrix.

What policy do you choose for state A?

Next level »