html
Objective: Learn to navigate the treacherous world of Markov chains.
You are a novice agent tasked with optimizing the behavior of a complex system. The system is a simple Markov chain with 5 states. Your goal is to find the optimal policy that maximizes the reward.
The system has a transition matrix:
| A | B | C | D | E |
|---|---|---|---|---|
| 0.9 | 0.1 | 0.5 | 0.3 | 0.2 |
| 0.5 | 0.8 | 0.2 | 0.4 | 0.6 |
| 0.7 | 0.3 | 0.9 | 0.8 | 0.1 |
Reward function: R(state) = state
You start at state A. The game is episodic, with each episode consisting of 10 steps. Your policy is to select the next state based on the transition matrix.
What policy do you choose for state A?