[SOLVED] Machine learning Homework 3

30.00 $

Category:

Description

Rate this product

Consider the problem of learning a regression model from 5 univariate observations

((0.8), (1), (1.2), (1.4), (1.6)) with targets (24,20,10,13,12).

1) [5v] Consider the basis function, πœ™π‘—(π‘₯) = π‘₯𝑗, for performing a 3-order polynomial regression,

𝑧

Μ‚(π‘₯, 𝐰) = βˆ‘π‘€π‘—πœ™π‘—(π‘₯) = 𝑀0 + 𝑀1π‘₯ + 𝑀2π‘₯2 + 𝑀3π‘₯

.earn the Ridge regression (𝑙

2 regularization) on the transformed data space using the closed

form solution with πœ† = 2.

Hint: use numpy matrix operations (e.g., linalg.pinv for inverse) to validate your calculus.

2) [1v] Compute the training RMSE for the learnt regression model.

3) [6v] Consider a multi-layer perceptron characterized by one hidden layer with 2 nodes. Using the

activation function 𝑓(π‘₯) = 𝑒0.1π‘₯ on all units, all weights initialized as 1 (including biases), and the

half squared error loss, perform one batch gradient descent update (with learning rate πœ‚ = 0.1)

for the first three observations (0.8), (1) and (1.2).

  1. Programming and critical analysisΒ 

Consider the following three regressors applied on kin8nm.arff data (available at the webpage):

βˆ’ linear regression with Ridge regularization term of 0.1

βˆ’ two MLPs – 𝑀𝐿𝑃1 and 𝑀𝐿𝑃2 – each with two hidden layers of size 10, hyperbolic tangent

function as the activation function of all nodes, a maximum of 500 iterations, and a fixed

seed (random_state=0). 𝑀𝐿𝑃1 should be parameterized with early stopping while 𝑀𝐿𝑃2

should not consider early stopping. Remaining parameters (e.g., loss function, batch size,

regularization term, solver) should be set as default.

Using a 70-30 training-test split with a fixed seed (random_state=0):

4) [4v] Compute the MAE of the three regressors: linear regression, 𝑀𝐿𝑃1 and 𝑀𝐿𝑃2.

5) [1.5v] Plot the residues (in absolute value) using two visualizations: boxplots and histograms.

Hint: consider using boxplot and hist functions from matplotlib.pyplot to this end

6) [1v] How many iterations were required for 𝑀𝐿𝑃1 and 𝑀𝐿𝑃2 to converge?

7) [1.5v] What can be motivating the unexpected differences on the number of iterations?

Hypothesize one reason underlying the observed performance differences between the MLPs.