[SOLVED] CSC421/2516 Homework 5

35.00 $

Category:

Description

5/5 - (1 vote)

Submission: You must submit your solutions as a PDF through MarkUs. You can produce the file however you like (e.g. LaTeX, Microsoft Word, scanner) as long as it is readable.

Late Submission: MarkUs will remain open until 3 days after the deadline, after which no late submissions will be accepted. The late penalty is 10% per day, rounded up.

Weekly homeworks are individual work. See the Course Information handout[1] for detailed policies.

Due to the shortened time period, this assignment has only one question, worth 6 points. You get the remaining 4 points for free.

  1. Variational Free Energy [6pts] Here, your job is to derive some of the formulas relating to the variational free energy (VFE) which we maximize when we train a VAE. Recall that the VFE is defined as:

F(q) = Eq[logp(x|z)] − DKL(q(z)kp(z)),

and KL divergence is defined as

DKL(q(z)kp(z)) = Eq[logq(z) − logp(z)].

We assume the prior p(z) is a standard Gaussian:

D                                  D

p(z) = N(z;0,I) = Ypi(zi) = YN(zi;0,1).

i=1                              i=1

And the variational approximation q(z) is a fully factorized (i.e. diagonal) Gaussian:

D                                  D

q(z) = N(z;µ,Σ) = Yqi(zi) = YN(zi;µii).

i=1                              i=1

For reference, here are the formulas for the univariate and multivariate Gaussian distributions:

  • [1pt] Show that

F(q) = logp(x) − DKL(q(z)kp(z|x)).

(Hint: expand out definitions and apply Bayes’ Rule.)

  • [1pt] Show that the KL term decomposes as a sum of KL terms for individual dimensions.

In particular,

DKL(q(z)kp(z)) = XDKL(qi(zi)kpi(zi)).

i

1

CSC421/2516 Winter 2019                                                                                                                            Homework 5

  • [2pts] Give an explicit formula for the KL divergence DKL(qi(zi)kpi(zi)). This should be a mathematical expression involving µi and σi. If you like, you may suppress the i subscripts in your solution.
  • [2pts] One way to do gradient descent on the KL term is to apply the formula from part (c). Another approach is to compute stochastic gradients using the reparameterization trick:

θDKL,

where

and

Show how to compute a stochastic estimate of ∇θDKL(qi(zi)kpi(zi)) by doing backprop on the above equations. You may find it helpful to draw the computation graph. If you like, you may suppress the i subscripts in your solution.

2

[1] http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/syllabus.pdf