Description
Code folder
Find the folder with the provided code in the following google colab: https://colab.research.google.com/drive/19ht5cd7CoEkotj3bWnBaGKHaSdWhObH4?usp=sharing
Environment
Since this homework is in Google colab, it will not require any separate environment setup
Assignment
- Make a copy of the colab to your drive, and then go through the skeleton code in it. At the very end, you will find a function that is supposed to run the bandit algorithms and plot their cumulative regret over time. Complete this function, and verify that it works by testing it with the two given environments and the FullyRandom solver.Note: For a full score on this problem, the following must be true: each solver must be denoted by a different color, and each environment (Bernoulli bandit and Gaussian bandit) must be shown on a different plot. Make sure to label each of the two plots and each line in each plot with the associated algorithm as well. For formatting guidance, look at the given plot
- Once you have finished it, implement EpsilonGreedy, UCB, and Thompson Sampling solvers. Make sure when you run the colab notebook, it generates the two associated plots: one for the Bernoulli bandit with all the algorithms, and another for the Gaussian bandit with all the algorithms.Submit a link to this completed colab to the submission link above. Before submission, once again make sure that the following are in order:
a) plot titles,
b) axis labels,
c) line legends.Also, make sure the sharing settings are turned on so we can check your solution and run it.




