Name: CSCI-GA3033-090 - Deep Reinforcement Learning Homework 4 - Exploration Algorithms -Solved
SKU: 85104
Availability: InStock

Description

5/5 - (1 vote)

Code folder

Environment

Since this homework is in Google colab, it will not require any separate environment setup

Make a copy of the colab to your drive, and then go through the skeleton code in it. At the very end, you will find a function that is supposed to run the bandit algorithms and plot their cumulative regret over time. Complete this function, and verify that it works by testing it with the two given environments and the FullyRandom solver.Note: For a full score on this problem, the following must be true: each solver must be denoted by a different color, and each environment (Bernoulli bandit and Gaussian bandit) must be shown on a different plot. Make sure to label each of the two plots and each line in each plot with the associated algorithm as well. For formatting guidance, look at the given plot