[SOLVED] CSE574-Programming Assignment 1Linear Models for Supervised Learning

24.99 $

Category:

Description

5/5 - (1 vote)

Part I – Linear Regression

In this part you will implement the direct and gradient descent based learning methods for Linear Regression and compare the results on the provided “diabetes” dataset.

Problem 1: Linear Regression with Direct Minimization

Implement ordinary least squares method to estimate regression parameters by minimizing the squared loss.

(1)

In matrix-vector notation, the loss function can be written as:

Xw)>(y Xw)                                              (2)

where X is the input data matrix, y is the target vector, and w is the weight vector for regression.

You need to implement the function learnOLERegression. Also implement the function testOLERegression to apply the learnt weights for prediction on both training and testing data and to calculate the root mean squared error (RMSE):

(3)

REPORT 1.

Calculate and report the RMSE for training and test data for two cases: first, without using an intercept (or bias) term, and second with using an intercept. Which one is better?

Problem 2: Using Gradient Descent for Linear Regression Learning

As discussed in class, regression parameters can be calculated directly using analytical expressions (as in Problem 1). However, to avoid computation of (X>X)−1, another option is to use gradient descent to minimize the loss function. In this problem, you have to implement the gradient descent procedure for estimating the weights w, where the gradient is given by:

J(w) = X>Xw X>y                                                     (4)

You need to use the minimize function (from the scipy library). You need to implement a function regressionObjVal to compute the squared error (See (2)) and a function regressionGradient to compute its gradient with respect to w. In the main script, this objective function and the gradient function will be used within the minimizer (See https://docs.scipy.org/doc/scipy/reference/generated/scipy. optimize.minimize.html for more details).

REPORT 2.

Using testOLERegression, calculate and report the RMSE for training and test data after gradient descent based learning. Compare with the RMSE after direct minimization. Which one is better?

Part II – Linear Classifiers

In this part you will implement three different linear classifiers using different optimization algorithms and compare the results on the provided data set. You will also have to draw the discrimination boundaries for the three classifiers and compare. The three classifiers are:

  1. Perceptron
  2. Logistic Regression
  3. Linear Support Vector Machine (SVM)

For each classifier, the decision rule is the same, i.e., the target, yi, for a given input, xi is given by:

−1       if w>xi < 0

+1      if w>xi ≥ 0                                                   (5)

where w are the weights representing to the linear discriminating boundary. We will assume that we have included a constant term in xi and a corresponding weight in w. While all three classifiers have the same decision function1

For this part, you will implement the training algorithms for the three different linear classifiers, learn a model for the sample training data and report the accuracy on the sample training and test data sets. The sample training and test data sets are included in the “sample.pickle” file.

Problem 3: Using Gradient Descent for Perceptron Learning

For this problem, you will training a perceptron, which has a squared loss function which is exactly the same as linear regression (See (1)), i.e.,

(8)

which means that you can call the same functions, regressionObjVal and regressionGradient, implemented in Problem 2, to train the perceptron. Implement two functions:

  1. a testing function, predictLinearModel that returns the predictions of a model on a test data set
  2. an evaluation function, evaluateLinearModel, that computes the accuracy of the model on the test data by calculating the fraction of observations for which the predicted label is same as the true label.

REPORT 3.

Train the perceptron model by calling the scipy.optimize.minimize method and use the evaluateLinearModel to calculate and report the accuracy for the training and test data.

Problem 4: Using Newton’s Method for Logistic Regression Learning

For this problem, you will train a logistic regression model, whose loss function (also known as the logistic-loss or log-loss) is given by:

))                                    (9)

1For Logistic Regression, typically a different formulation is presented. The decision rule is written as:

1        if θi < 0.5

(6)

+1        if θi ≥ 0.5

where,

(7)

However, one can see that it is equivalent to checking if w>xi < 0 or not.

The gradient for this loss function is given by, as derived in the class:

xi                                      (10)

The Hessian for the loss function is given by:

H                               (11)

Newton’s Method The update rule is given by:

w(t) = w(t−1) + ηH−1(w(t−1))∇J(w(t−1))

However, for this assignment we will be using the scipy.optimize.minimize function again, with method = ’Newton-CG’, for training using the Newton’s method. This will need you to implement the following three functions:

  1. logisticObjVal – compute the logistic loss for the given data set (See (9)).
  2. logisticGradient – compute the gradient vector of logistic loss for the given data set (See (10)).
  3. logisticHessian – compute the Hessian matrix of logistic loss for the given data set (See (11)).

REPORT 4.

Train the logistic regression model by calling the scipy.optimize.minimize method, and use the evaluateLinearModel to calculate and report the accuracy for the training and test data.

Problem 5: Using Stochastic Gradient Descent Method for Training Linear Support Vector Machine

While we will study the quadratic optimization formulation for SVMs in class, we can also train the SVM directly using the hinge-loss given by:

n

J(w) = Xmax(0,1 − yiw>xi)                                             (12)

i=1

Clearly, the above function is not as easily differentiable as the squared-loss and logistic-loss functions above, we can devise a simple Stochastic Gradient Descent (SGD) based method for learning w. Note that, for a single observation, the loss is given by:

Ji(w)      =        max(0,1 − yiw>xi)                                                    (13)

>                          >

(14)

Thus, the gradient of Ji(w) can be written as:

(15)

The training can be done using the following algorithm:

1: w ← [0,0,…,0]>

2: for t=1,2,…T do

3:                i RandomSample(1…n)

4:                if yiw>x(i) < 1 then

5:                   w w + ηyixi

6: end if 7: end for

You have to implement a function trainSGDSVM that learns the optimal weight, w using the above algorithm.

REPORT 5.

Train the SVM model by calling the trainSGDSVM method for 200 iterations (set learning rate parameter η to 0.01). Use the evaluateLinearModel to calculate and report the accuracy for the training and test data.

Problem 6: Comparing Linear Classifiers

Using the results for Problems 3–4, provide a comparison of the three different linear models (Perceptrons, Logistic Regression, and Support Vector Machines) on the provided data set.

REPORT 6.

  1. Use the results for test data to determine which classifier is the most accurate?
  2. Plot the decision boundaries learnt by each classifier using the provided plotDecisionBoundary function which takes the learnt weight vector, w as one of the parameters. Study the three boundaries and provide your insights.