Description
Given the training set x and corresponding label set t, we want to predict the label t of new test point x. In other words, we wish to evaluate the predictive distribution p(t|x,x,t).
A linear regression function can be expressed as below where the φ(x) is a basis function:
y(x,w) = wTφ(x)
In order to make prediction of t for new test data x from the learned w, we will:
- Multiply the likelihood function of new data p(t|x,w) and the posterior distribution of training set with label set.
- Take the integral over w to find the predictive distribution:
.
Now, please answer the following questions:
- Why we need the basis function φ(x) for linear regression? And what is the benefit for applying basis function over linear regression?
- Prove that the predictive distribution just mentioned is the same with the form
p(t|x,x,t) = N(t|m(x),s2(x))
where
s2(x) = β−1 + φ(x)TSφ(x).
Here, the matrix S−1 is given by S
(hint: p(w|x,t) ∝ p(t|x,w)p(w) and you may use the formulas shown in page 93.)
- Could we use linear regression function for classification? Why or why not? Explain it!
1
1 Linear Regression
In this homework, you need to predict the chance of being admit in base on relevant student resume data. The following two approaches need to be realized respectively:
- Maximum likelihood approach (ML)
- Maximum a posteriori approach (MAP)
model! Dataset provides total 500 students with 7 features. Can you use these features to predict the chance of admit for your own dream school?
One might consider the following steps to start the work:
- Download and check for the dataset.
- Create a new Colab or Jupyter notebook file.
- Divide the dataset into training and validation.
Dataset Description
- dataset X.csv contains 7 different resume feature served as the input.
GRE score, TOFEL score, University rating, SOP, LOR, CGPA, Research
- dataset T.csv contains the chance of admit regard as the target. Chance of Admit
Specification
- For those problems with Code Result at the end, you must show your result in your .ipynb file or you will get no
- For those problem with Explain at the end, you must have a clear explanation or you will get low points.
- You are also encouraged to have some discussion on those problem which is not marked as Explain.
- Feature select
In real-world applications, the dimension of data is usually more than one. In the training stage, please fit the data by applying a polynomial function of the form
D D D
y(x,w) = w0 + Xwixi + XXwijxixj (M = 2)
i=1 i=1 j=1
and minimizing the error function.
- In the feature selection stage, please apply polynomials of order M = 1 and M = 2 over the dimension D = 7 input data. Please evaluate the corresponding RMS error on the training set and valid set. Code Result
- How will you analysis the weights of polynomial model M = 1 and select the most contributive feature? Code Result, Explain
- Maximum likelihood approach
- Which basis function will you use to further improve your regression model, Polynomial, Gaussian, Sigmoidal, or hybrid? Explain
- Introduce the basis function you just decided in (a) to linear regression model and analyze the result you get. (Hint: You might want to discuss about the phenomenon when model becomes too complex.) Code Result, Explain
φ(x) = [φ1(x),φ2(x),…,φN(x),φbias(x)]
- Apply N-fold cross-validation in your training stage to select at least one hyperparameter(order, parameter number, …) for model and do some discussion(underfitting, overfitting). Code Result, Explain
- Maximum a posterior approach
- What is the key difference between maximum likelihood approach and maximum a posterior approach? Explain
- Use Maximum a posterior approach method to retest the model in 2 you designed. You could choose Gaussian distribution as a prior. Code Result
- Compare the result between maximum likelihood approach and maximum a posterior approach. Is it consistent with your conclusion in (a)? Explain



