Description
Code folder
Find the folder with the provided code in the following google drive folder: https://drive.google.com/drive/folders/1lm9-9in2OheyPowIWnxkCykXfp-u76Go?usp=sharing Download all the files in the same directory, and run the run.py file to run your code. You will have to complete the TODOs in ppo/ppo.py to complete this homework.
Environment
We will reuse the environment from homework 2, so you will not need to install anything else on top of it. If you need more directions about setting up with it, see here: https://docs.google.com/document/d/1p_mU1jZEQZk7gP_qgwVPtv6iae4bnjMa2FHkqV5t4K4/edit
Questions
- In the code folder, you will find already available code for running REINFORCE. Run this code on the following environments: Pendulum-v0, BipedalWalker-v3, and LunarLanderContinuous-v2. It is okay if REINFORCE does not perform as well in these environments. Generate the plot over training times for these 3 environments over three different seeds, and create three plots that show the average performance of REINFORCE on each environment. Why do you think REINFORCE suffers in these environments?
- Now, complete the PPO code found in ppo/ppo.py. You will find a few different TODOs for you. Follow the original PPO pseudocode if you need to. Once again, use the previous three environments and three different seeds to plot your training rewards. Clearly show the comparison between REINFORCE and PPO in your plots.Your expected mean performance should be AT LEAST:
Pendulum: -400
BipedalWalker: 125
LunarLanderContinuous: 100




