Description
Problem 1.
Consider the training objectiveHow would the hypothesis class capacity, overfitting/underfittting, and bias/variance varyπ½ = ||ππ€ β π‘||2 subject to ||π€||2 β€ πΆ for some constant πΆ.
according to πΆ?
| Larger | Smaller | |
| Model capacity (large/small?) | _____ πΆ | _____Β Β πΆ |
| Overfitting/Underfitting? | __fitting | __fitting |
| Bias variance (how/low?) | __ bias / __ variance | __ bias / __ variance |
Note: No proof is needed Problem 2.
π‘(π) βΌ π(π€π₯(π), Ο
π€Consider a one-dimensional linear regression modelβΌ π(0, Ο 2). Show that the posterior of π€ is also a Gaussian distribution, i.e.,Ο΅2) with a Gaussian prior
π€|π₯(1), π‘(1), π€ Β·Β·Β·, π₯(π), π‘(π) βΌ π(Β΅πππ π‘, Οπππ π‘2). Give the formulas for Β΅πππ π‘, Οπππ π‘2.
Note: If a prior has the same formula (but typically with different parameters) as the posterior, itHint: Work with π(π€|π·) β π(π€)π(π·|π€). Do not handle the normalizing term.
is known as a conjugate prior. The above conjugacy also applies to multi-dimensional Gaussian, but the formulas for the mean vector and the covariance matrix will be more complicated.
Problem 3.
equivalent toGive the prior distribution ofπ1-penalized mean square loss.π€ for linear regression, such that the max a posteriori estimation is



