Name: CSCI-GA3033-090 - Deep Reinforcement Learning Homework 3 - Policy Gradient Algorithms -Solved
SKU: 85096
Availability: InStock

Description

5/5 - (2 votes)

Code folder

Find the folder with the provided code in the following google drive folder: https://drive.google.com/drive/folders/1lm9-9in2OheyPowIWnxkCykXfp-u76Go?usp=sharing Download all the files in the same directory, and run the run.py file to run your code. You will have to complete the TODOs in ppo/ppo.py to complete this homework.

We will reuse the environment from homework 2, so you will not need to install anything else on top of it. If you need more directions about setting up with it, see here: https://docs.google.com/document/d/1p_mU1jZEQZk7gP_qgwVPtv6iae4bnjMa2FHkqV5t4K4/edit

Questions

In the code folder, you will find already available code for running REINFORCE. Run this code on the following environments: Pendulum-v0, BipedalWalker-v3, and LunarLanderContinuous-v2. It is okay if REINFORCE does not perform as well in these environments. Generate the plot over training times for these 3 environments over three different seeds, and create three plots that show the average performance of REINFORCE on each environment. Why do you think REINFORCE suffers in these environments?

Now, complete the PPO code found in ppo/ppo.py. You will find a few different TODOs for you. Follow the original PPO pseudocode if you need to. Once again, use the previous three environments and three different seeds to plot your training rewards. Clearly show the comparison between REINFORCE and PPO in your plots.Your expected mean performance should be AT LEAST:
Pendulum: -400
BipedalWalker: 125
LunarLanderContinuous: 100