Name: IE613-Assignment 2 Solved
SKU: 61294
Availability: InStock

Description

5/5 - (3 votes)

Question 1 Suppose that X is σ−subgaussian and X₁and X₂are independent and σ₁and σ₂−subgaussian respectively, then:

E[X]=0 and Var[X] ≤ σ².
cX is |c|σ−subgaussian for all c ∈ R.

Question 2 Suppose that X is zero-mean and X ∈ [a,b] almost surely for constants a < b.

Show that X is (b − a)/2−
Using Cramer-Chernoff method shows that if X₁,X₂,…,X_nare independent and X_t∈ [a_t,b_t] almost surely with a_t< b_tfor all t, then prove

Question 3 [Expectation of maximum] Let X₁,…,X_nbe a sequence of σ-subgaussian random variables (possibly dependent) and Z = max_t_∈[n_]X_t. Prove that

2.for any δ ∈ (0,1).

Question 4 [Bernstein’s inequality] Let be a sequence of independent random vari-

ables with X_t− E[X_t] ≤ b almost surely and S = ^P(X_t− E[X_t]) and v = ^PV [X_t].

t=1 t=1

Show that is increasing.
Let X be a random variable with E[X] = 0 and X ≤ b almost surely. Show that E[exp(X)] ≤ 1 + g(b)V [X].
Prove that for all α ≥ 0. Prove that this is the best possible approximation in the sense that the 2 in the denominator cannot be increased.

2-1

2-2 Homework 2: March 19

Let and and prove that
Use the previous result to show that

Question 5 Show that

implies the regret of an optimally tuned Explore-then-Commit (ETC) algorithm for subgaussian√ 2−armed

bandits with means µ₁,µ₂∈ R and ∆ = |µ₁− µ₂|, satisfies R_T≤ ∆ + C T where C > 0 is a universal constant.

Question 6 Fix δ ∈ (0,1). Modify the ETC algorithm to depend on δ and prove a bound on the pseudo-regret of ETC algorithm that holds with probability 1 − δ where A_tis the arm chosen in the round t.

Hint: Choose ‘m’ appropriately in the regret upper bound of ETC algorithm which is proved in the class.

Question 7 Fix δ ∈ (0,1). Prove a bound on the random regret of ETC algorithm that holds with probability 1 − δ. Compare this to the bound derived for the pseudo-regret in the question 5. What can you conclude?

Question 8 Assume the rewards are 1−subgaussian and there are k ≥ 2 arms. The −greedy algorithm depends on a sequence of parameters. First it chooses each arm once and subsequently chooses A_t= argmaxµˆ_i(t − 1) with probability and otherwise chooses an arm uniformly at random.

Prove that if, then.
Let ∆_min= min{∆_i: ∆_i> 0} where ∆_i= µ^?− µ_i, and where C > 0 is a sufficiently large universal constant. Prove that there exists a universal C⁰> 0 such that

Question 9 Fix a 1−subgaussian k−armed bandit environment and a horizon T. Consider the version of UCB that works in phases of exponentially increasing length of 1,2,4,…. In each phase, the algorithm uses the action that would have been chosen by UCB at the beginning of the phase.

Homwork 2: March 19 2-3

State and prove a bound on the regret for this version of UCB.
How would the result change if the l^thphase had a length of dα^le with α > 1?

[SOLVED] IE613-Assignment 2

If Helpful Share:

Description

Related products

IE613-Assignment 1

IE613-Assignment 3

Related in this category

More in this category

IE613-Assignment 3

IE613-Assignment 1