Name: STATS215-Assignment 2 Solved
SKU: 70444
Availability: InStock

Description

Rate this product

Problem 1: Bernoulli GLMs as a latent variable models. Consider a Bernoulli regression model,

w ∼ N (µ, Σ)

y_n| x_n, w ∼ Bern(f (w^Tx_n)) for n = 1, . . . , N,

where w and x_nare vectors in R^D, y_n∈ {0, 1}, and f : R → [0, 1] is the mean function. In class we studied

Newton’s method for finding the maximum a posteriori (MAP) estimate w^?= arg max p(w | {x_n, y_n}^N_n₌1). Now we will consider methods for approximating the full posterior distribution.

Rather than using the logistic function, let the mean function be the normal cumulative distribution function (CDF), or “probit” function,

f (u)= Pr(z ≤ u) where z ∼ N (0, 1)

Z u

= N (z; 0, 1) dz.

−∞

This is called the probit regression model. Show that the likelihood p(y_n| x_n, w) is a marginal of a joint distribution,

p(y_n, z_n| x_n, w)=I[z_n≥ 0]I[^yⁿ=^[1]]I[z_n< 0]I[^yⁿ=⁰]N (z_n| x_n^Tw, 1).

«Your answer here.»

Derive the conditional distributions p(w | {x_n, y_n, z_n}^N_n₌1) and p(z_n| x_n, y_n, w).¹

(c) Gibbs sampling is a Markov chain Monte Carlo (MCMC) method for approximate posterior inference. It works by repeatedly sampling from the conditional distribution of one variable, holding all others fixed. For the probit regression model, this means iteratively performing these two steps:

Sample z_n∼ p(z_n| x_n, y_n, w) for n = 1, . . . , N holding w fixed;
Sample w ∼ p(w | {x_n, y_n, z_n} holding {z_n}^N_n₌1

Note the similarity to EM: rather than computing a posterior distribution over z_n, we draw a sample from it; rather than setting w to maximize the ELBO, we draw a sample from its conditional distribution. It can be shown that this algorithm defines a Markov chain on the space of (w, {z_n}^N_n₌1)

whose stationary distribution is the posterior p(w, {z_n}^N_n₌1 | {x_n, y_n}^N_n₌1). In other words, repeating these steps infinitely many times would yield samples of w and {z_n}^N_n₌1 drawn from their posterior distribution.

Implement this Gibbs sampling algorithm and test it on a synthetic dataset with D = 2 dimensional covariates and N = 100 data points. Scatter plot your samples of w and, for comparison, plot the true value of w that generated the data. Do your samples look approximately Gaussian distributed? How does the posterior distribution change when you vary N?

«Your figures and captions here.»

(d) Bonus. There are also auxiliary variable methods for logistic regression, where f (u)= e^u/(1 + e^u).

Specifically, we have that,

eyn·wwTTxxnn Z ∞ exp yn − 12xnTw− 21zn(wTxn)2 PG(zn; 1, 0) dzn, 1 + e

where PG(z; b, c) is the density function of the Pólya-gamma (PG) distribution over z ∈ R₊with parameters b and c. The PG distribution has a number of nice properties: it is closed under exponential tilting so that,

1 ₂

e−2^zcPG(z; b, 0) ∝ PG(z; b, c),

and its expectation is available in closed form,

Ez∼PG(b,c)[z]= 2bc tanh 2c .

Use these properties to derive an EM algorithm for finding w^?= arg max p({y_n} | {x_n}, w). How do the EM updates compare to Newton’s method?

Problem 2: Spike sorting with mixture models

As discussed in class, “spike sorting” is ultimately a mixture modeling problem. Here we will study the problem in more detail. Let {y_n}^N_n₌1 represent a collection of spikes. Each y_n∈ R^Dis a vector containing features of the n-th spike waveform. For example, the features may be projections of the spike waveform onto the top D principal components. We have the following, general model,

z_n| π ∼ π

y_n| z_n, θ ∼ p(y_n| θ_z_n).

The label z_n∈ {1, . . . , K} indicates which of the K neurons generated the n-th spike waveform. The probability vector π ∈ ∆_Kspecifies a prior distribution on spike labels, and the parameters θ ⁼{θ_k}^K_k₌1 determine the likelihood of the spike waveforms y_nfor each of the K neurons. The goal is to infer a posterior distribution p(z_n| y_n, π, θ) over labels for each observed spike, and to learn the parameters π^?and θ^?that maximize the likelihood of the data. (a) Start with a Gaussian observation model,

y_n| z_n, θ ∼ N (y_n| µ_z_n, Σ_z_n),

where θ_k=(µ_k, Σ_k) includes the mean and covariance for the k-th neuron.

Derive an EM algorithm to compute π^?, θ^?= arg max p⁽{y_n}^N_n₌1 | π, ^θ). Start by deriving the

“responsibilities” w_nk= p(z_n= k | y_n, ^π0, ^θ0) for fixed parameters ^π0 and ^θ0. Then use the responsibilities to compute the expected log joint probability,

L(π, θ⁾⁼Ep(z_n|y_n,π₀,θ₀)[log p(yn, zn | π, θ)] .

n₌₁

Finally, find closed-form expressions for π^?and θ^?that optimize L(π, θ).

(b) The Gaussian model can be sensitive to outliers and lead spikes from one neuron to be split into two clusters. One way to side-step this issue is to replace the Gaussian with a heavier-tailed distribution like the multivariate Student’s t, which has probability density,

p(yn | θzn)= Γ [(αD0/2+ DD/)2/2Σ]zn1/2 1 + α10 (yn −µzn)TΣ−zn1(yn −µzn)−.(α0+D)/2

Γ(α0/2)α0 π

We will treat α₀as a fixed hyperparameter.

Like the negative binomial distribution studied in HW1, the multivariate Student’s t can also be represented as an infinite mixture,

Z Z

p(yn | θzn)= p(yn, τn | θzn) dτn = N (yn; µzn, τ−n1Σzn) Gamma(τn; α20 , ) dτn.

We will derive an EM algorithm to find π^?, θ^?in this model.

First, show that the posterior takes the form

p(τ_n, z_n y_n, π, θ)= p(z_n| y_n, π, θ) p(τ_n| z_n, y_n, θ)

Y^K I[z_n=k]

= w_nkGamma(τ_n| a_nk, b_nk) ,

k=1

and solve for the parameters w_nk, a_nk, b_nkin terms of y_n, π, and θ.

L(π, θ)= Ep(τ_n,_z_n|y_n,_π₀,_θ₀)[log p(y_n, z_n, τ_n| π, θ)] ,

n=1

using the fact that E[X]= a/b for X ∼ Gamma(a, b). You may omit terms that are constant with respect to π and θ.

(d) Finally, solve for π^?and θ^?that maximize the expected log joint probability. How does your answer compare to the solution you found in part (a)?

[1] Observe that z_nis conditionally independent of {x_n0, y_n0,z_n0}n06₌n given w.

[SOLVED] STATS215-Assignment 2

If Helpful Share:

Description

«Your figures and captions here.»

Related products

STATS215-Assignment 1

STATS215-Assignment 3

STATS215-Assignment 4

Related in this category

More in this category

STATS215-Assignment 3

STATS215-Assignment 1

STATS215-Assignment 4