Description
This project seeks to understand the three reinforcement learning algorithms by applying them each to two different Markov decision processes (MDP). The reinforcement learning methods are value iteration, policy iteration, and Q-learning. The two MDP toy problems are inspired by Pacman! There is a small 5×5 grid world, and a large 20×20 grid world.
For each grid, Pacman (our learning agent) starts in the top left corner and attempts to navigate his way to the goal by collecting a high score along his journey. Like the real game, Pacman has the opportunity to earn points by eating pellets and fruit, but he must avoid hitting the ghost at all costs. The reward structure for each grid world is represented by:
- Small pellets (S) = +1 point
- Medium fruit (M) = +2.5 points
- Large ghosts (L) = -50 points
- Reaching the goal = +100 points
- Every step = -5 points to encourage Pacman to reach his goal quickly.





