[SOLVED] STATS215-Assignment 2

30.99 $

Category:

Description

Rate this product

Problem 1: Bernoulli GLMs as a latent variable models. Consider a Bernoulli regression model,

w ∼ N (µ, Σ)

yn | xn, w ∼ Bern(f (wTxn)) for n = 1, . . . , N,

where w and xn are vectors in RD, yn ∈ {0, 1}, and f : R [0, 1] is the mean function. In class we studied

Newton’s method for finding the maximum a posteriori (MAP) estimate w? = arg max p(w | {xn, yn}Nn=1). Now we will consider methods for approximating the full posterior distribution.

  • Rather than using the logistic function, let the mean function be the normal cumulative distribution function (CDF), or “probit” function,

f (u)= Pr(z ≤ u) where z ∼ N (0, 1)

Z u

=              N (z; 0, 1) dz.

−∞

This is called the probit regression model. Show that the likelihood p(yn | xn, w) is a marginal of a joint distribution,

p(yn, zn | xn, w)=I[zn 0]I[yn=[1]]I[zn < 0]I[yn=0]N (zn | xnTw, 1).

«Your answer here.»

  • Derive the conditional distributions p(w | {xn, yn, zn}Nn=1) and p(zn | xn, yn, w).1

(c) Gibbs sampling is a Markov chain Monte Carlo (MCMC) method for approximate posterior inference. It works by repeatedly sampling from the conditional distribution of one variable, holding all others fixed. For the probit regression model, this means iteratively performing these two steps:

  1. Sample zn ∼ p(zn | xn, yn, w) for n = 1, . . . , N holding w fixed;
  2. Sample w ∼ p(w | {xn, yn, zn} holding {zn}Nn=1

Note the similarity to EM: rather than computing a posterior distribution over zn, we draw a sample from it; rather than setting w to maximize the ELBO, we draw a sample from its conditional distribution. It can be shown that this algorithm defines a Markov chain on the space of (w, {zn}Nn=1)

whose stationary distribution is the posterior p(w, {zn}Nn=1 | {xn, yn}Nn=1). In other words, repeating these steps infinitely many times would yield samples of w and {zn}Nn=1 drawn from their posterior distribution.

Implement this Gibbs sampling algorithm and test it on a synthetic dataset with D = 2 dimensional covariates and N = 100 data points. Scatter plot your samples of w and, for comparison, plot the true value of w that generated the data. Do your samples look approximately Gaussian distributed? How does the posterior distribution change when you vary N?

«Your figures and captions here.»

(d) Bonus. There are also auxiliary variable methods for logistic regression, where f (u)= eu/(1 + eu).

Specifically, we have that,

eyn·wwTTxxnn Z exp  yn − 12xnTw− 21zn(wTxn)2           PG(zn; 1, 0) dzn, 1 + e

where PG(z; b, c) is the density function of the Pólya-gamma (PG) distribution over z ∈ R+ with parameters b and c. The PG distribution has a number of nice properties: it is closed under exponential tilting so that,

1 2

e2zc PG(z; b, 0) PG(z; b, c),

and its expectation is available in closed form,

Ez∼PG(b,c)[z]= 2bc tanh 2c  .

Use these properties to derive an EM algorithm for finding w? = arg max p({yn} | {xn}, w). How do the EM updates compare to Newton’s method?

Problem 2: Spike sorting with mixture models

As discussed in class, “spike sorting” is ultimately a mixture modeling problem. Here we will study the problem in more detail. Let {yn}Nn=1 represent a collection of spikes. Each yn RD is a vector containing features of the n-th spike waveform. For example, the features may be projections of the spike waveform onto the top D principal components. We have the following, general model,

zn | π ∼ π

yn | zn, θ ∼ p(yn | θzn).

The label zn ∈ {1, . . . , K} indicates which of the K neurons generated the n-th spike waveform. The probability vector π ∈ ∆K specifies a prior distribution on spike labels, and the parameters θ = k}Kk=1 determine the likelihood of the spike waveforms yn for each of the K neurons. The goal is to infer a posterior distribution p(zn | yn, π, θ) over labels for each observed spike, and to learn the parameters π? and θ? that maximize the likelihood of the data. (a) Start with a Gaussian observation model,

yn | zn, θ ∼ N (yn | µzn, Σzn),

where θk =(µk, Σk) includes the mean and covariance for the k-th neuron.

Derive an EM algorithm to compute π?, θ? = arg max p({yn}Nn=1 | π, θ). Start by deriving the

“responsibilities” wnk = p(zn = k | yn, π0, θ0) for fixed parameters π0 and θ0. Then use the responsibilities to compute the expected log joint probability,

N

X

L(π, θ)=                Ep(zn|yn,π0,θ0)[log p(yn, zn | π, θ)] .

n=1

Finally, find closed-form expressions for π? and θ? that optimize L(π, θ).

(b) The Gaussian model can be sensitive to outliers and lead spikes from one neuron to be split into two clusters. One way to side-step this issue is to replace the Gaussian with a heavier-tailed distribution like the multivariate Student’s t, which has probability density,

p(yn | θzn)=                      Γ [(αD0/2+ DD/)2/2Σ]zn1/2 •1 + α10 (yn −µzn)TΣ−zn1(yn −µzn.(α0+D)/2

Γ(α0/2)α0 π

We will treat α0 as a fixed hyperparameter.

Like the negative binomial distribution studied in HW1, the multivariate Student’s t can also be represented as an infinite mixture,

Z                                               Z

p(yn | θzn)=             p(yn, τn | θzn) dτn =                 N (yn; µzn, τ−n1Σzn) Gamma(τn; α20 , ) dτn.

We will derive an EM algorithm to find π?, θ? in this model.

First, show that the posterior takes the form

p(τn, zn  yn, π, θ)= p(zn | yn, π, θ) p(τn | zn, yn, θ)

YK •                                                       ˜I[zn=k]

=               wnk Gamma(τn | ank, bnk)                  ,

k=1

and solve for the parameters wnk, ank, bnk in terms of yn, π, and θ.

(c) Now compute the expected log joint probability,

N

X

L(π, θ)=                  Ep(τn,zn|yn,π0,θ0)[log p(yn, zn, τn | π, θ)] ,

n=1

using the fact that E[X]= a/b for X ∼ Gamma(a, b). You may omit terms that are constant with respect to π and θ.

(d) Finally, solve for π? and θ? that maximize the expected log joint probability. How does your answer compare to the solution you found in part (a)?

[1] Observe that zn is conditionally independent of {xn0, yn0,zn0}n06=n given w.