[SOLVED] CS7641-Homework 3 Image compression with SVD and PCA

24.99 $

Category:

Description

Rate this product

 Image compression with SVD  **[P]** **[W]**

Load images data and plot

In [2]:
# HELPER CELL, DO NOT MODIFY
# load Image
image = plt.imread("hw3_image_2.jpg")/255.
#plot image
fig = plt.figure(figsize=(10,10))
plt.imshow(image)
Out[2]:
<matplotlib.image.AxesImage at 0x7fd7f813eef0>
In [3]:
# HELPER CELL, DO NOT MODIFY
def rgb2gray(rgb):   
    return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])

fig = plt.figure(figsize=(10, 10))
# plot several images
plt.imshow(rgb2gray(image), cmap=plt.cm.bone)
Out[3]:
<matplotlib.image.AxesImage at 0x7fd7f8c6b0f0>

1.1 Image compression **[P]**

SVD is a dimensionality reduction technique that allows us to compress images by throwing away the least important information.

Higher singular values capture greater variance and thus capture greater information from the corresponding singular vector. To perform image compression, apply SVD on each matrix and get rid of the small singular values to compress the image. The loss of information through this process is negligible and the difference between the images can hardly be spotted. For example, the variance captured by the first component

σ1ni=1σiσ1∑i=1nσi

where σiσi is the ithith singular value. You need to finish the following functions to do SVD and then reconstruct the image by components.

Hint 1: http://timbaumann.info/svd-image-compression-demo/ is an useful article on image compression.

In [4]:
class ImgCompression(object):
    def __init__(self):
        pass

    def svd(self, X): # [5pts]
        """
        Do SVD. You could use numpy SVD.
        Your function should be able to handle black and white
        images (N*D arrays) as well as color images (N*D*3 arrays)
        In the image compression, we assume that each colum of the image is a feature. Image is the matrix X.
        Args:
            X: N * D array corresponding to an image (N * D * 3 if color image)
        Return:
            U: N * N for black and white images / N * N * 3 for color images
            S: min(N, D) * 1 for black and white images / min(N, D) * 3 for color images
            V: D * D for black and white images / D * D * 3 for color images
        """
        # BnW image
        if len(X.shape) == 2:
            U, S, V = np.linalg.svd(X)
        else:
            # colour image
            N, D = X.shape[0], X.shape[1]
            min_ND = min(N, D)

            # channel0
            U0, S0, V0 = np.linalg.svd(X[:, :, 0])
            S0 = S0.reshape((min_ND, 1))

            # channel1
            U1, S1, V1 = np.linalg.svd(X[:, :, 1])
            S1 = S1.reshape((min_ND, 1))

            # channel2
            U2, S2, V2 = np.linalg.svd(X[:, :, 2])
            S2 = S2.reshape((min_ND, 1))

            #combining
            U = np.array([U0, U1, U2])
            U = U.transpose(1, 2, 0)

            S = np.concatenate((S0, S1, S2), axis=1)

            V = np.array([V0, V1, V2])
            V = V.transpose(1, 2, 0)
        return U, S, V
        
#         raise NotImplementedError


    def rebuild_svd(self, U, S, V, k): # [5pts]
        """
        Rebuild SVD by k componments.
        Args:
            U: N*N (*3 for color images)
            S: min(N, D)*1 (*3 for color images)
            V: D*D (*3 for color images)
            k: int corresponding to number of components
        Return:
            Xrebuild: N*D array of reconstructed image (N*D*3 if color image)

        Hint: numpy.matmul may be helpful for reconstructing color images
        """
        if len(U.shape) == 2:
            # BnW image
            # term1
            t1 = np.matmul(U[:, :k], np.diag(S)[:k, :k])
            # term2
            Xrebuild = np.matmul(t1, V[:k, :])
        else:
            # colour image
            N = U.shape[0]
            D = V.shape[0]

            # channel 0
            U0 = U[:, :, 0]
            S0 = S[:, 0]
            V0 = V[:, :, 0]
            # term1
            t1 = np.matmul(U0[:, :k], np.diag(S0)[:k, :k])
            # term2
            Xrebuild0 = np.matmul(t1, V0[:k, :])

            # channel 1
            U1 = U[:, :, 1]
            S1 = S[:, 1]
            V1 = V[:, :, 1]
            # term1
            t1 = np.matmul(U1[:, :k], np.diag(S1)[:k, :k])
            # term2
            Xrebuild1 = np.matmul(t1, V1[:k, :])

            # channel 2
            U2 = U[:, :, 2]
            S2 = S[:, 2]
            V2 = V[:, :, 2]
            # term1
            t1 = np.matmul(U2[:, :k], np.diag(S2)[:k, :k])
            # term2
            Xrebuild2 = np.matmul(t1, V2[:k, :])

            # combining
            Xrebuild = np.array([Xrebuild0, Xrebuild1, Xrebuild2])
            Xrebuild = Xrebuild.transpose(1, 2, 0)

        return Xrebuild   

#         raise NotImplementedError

    def compression_ratio(self, X, k): # [5pts]
        """
        Compute compression of an image: (num stored values in compressed)/(num stored values in original)
        Args:
            X: N * D array corresponding to an image (N * D * 3 if color image)
            k: int corresponding to number of components
        Return:
            compression_ratio: float of proportion of storage used by compressed image
        """
        if len(X.shape) == 2:
            # BnW image
            num_orig = X.shape[0] * X.shape[1]
            num_compress = k * (1 + X.shape[0] + X.shape[1])
        else:
            # colour image
            num_orig = X.shape[0] * X.shape[1] * X.shape[2]
            num_compress = k * (1 + X.shape[0] + X.shape[1]) * X.shape[2]
            
        compression_ratio = num_compress * 1.0 / num_orig
        return compression_ratio
        
#         raise NotImplementedError

    def recovered_variance_proportion(self, S, k): # [5pts]
        """
        Compute the proportion of the variance in the original matrix recovered by a rank-k approximation

        Args:
           S: min(N, D)*1 (*3 for color images) of singular values for the image
           k: int, rank of approximation
        Return:
           recovered_var: float (array of 3 floats for color image) corresponding to proportion of recovered variance
        """
        S2 = S ** 2
        if len(S.shape) == 1:
            # BnW image
            recovered_var = np.sum(S2[:k]) / np.sum(S2)
        else:
            # colour image
            # channel0
            recovered_var0 = np.sum(S2[:k, 0]) / np.sum(S2[:, 0])
            # channel1
            recovered_var1 = np.sum(S2[:k, 1]) / np.sum(S2[:, 1])
            # channel2
            recovered_var2 = np.sum(S2[:k, 2]) / np.sum(S2[:, 2])
            recovered_var = [recovered_var0, recovered_var1, recovered_var2]

        return recovered_var        
        
#         raise NotImplementedError

1.2 Black and white [5 pts] **[W]**

Use your implementation to generate a set of images compressed to different degrees. Include the images in your non-programming submission to the assignment.

In [5]:
# HELPER CELL, DO NOT MODIFY
imcompression = ImgCompression()
bw_image = rgb2gray(image)
U, S, V = imcompression.svd(bw_image)
component_num = [1,2,5,10,20,40,80,160,256]

fig = plt.figure(figsize=(18, 18))

# plot several images
i=0
for k in component_num:
    img_rebuild = imcompression.rebuild_svd(U, S, V, k)
    c = np.around(imcompression.compression_ratio(bw_image, k), 4)
    r = np.around(imcompression.recovered_variance_proportion(S, k), 3)
    ax = fig.add_subplot(3, 3, i + 1, xticks=[], yticks=[])
    ax.imshow(img_rebuild, cmap=plt.cm.bone)
    ax.set_title(f"{k} Components")
    ax.set_xlabel(f"Compression: {c},\nRecovered Variance: {r}")
    i = i+1

1.3 Color image **[W]**

Use your implementation to generate a set of images compressed to different degrees. Include the images in your non-programming submission to the assignment.

Note: You might get warning “Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).” This warning is acceptable since while rebuilding some of the pixels may go above 1.0. You should see similar image to original even with such clipping.

In [6]:
# HELPER CELL, DO NOT MODIFY
imcompression = ImgCompression()
U, S, V = imcompression.svd(image)

# component_num = [1,2,5,10,20,40,80,160,256]
component_num = [1,2,5,10,20,40,80,160,256]

fig = plt.figure(figsize=(18, 18))

# plot several images
i=0
for k in component_num:
    img_rebuild = imcompression.rebuild_svd(U, S, V, k)
    c = np.around(imcompression.compression_ratio(image, k), 4)
    r = np.around(imcompression.recovered_variance_proportion(S, k), 3)
    ax = fig.add_subplot(3, 3, i + 1, xticks=[], yticks=[])
    ax.imshow(img_rebuild)
    ax.set_title(f"{k} Components")
    ax.set_xlabel(f"Compression: {np.around(c,4)},\nRecovered Variance:  R: {r[0]}  G: {r[1]}  B: {r[2]}")
    i = i+1
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

2 Understanding PCA ] **[P]** | **[W]**

2.1 Implementation  **[P]**

Principal Component Analysis (PCA) is another dimensionality reduction technique that reduces dimensions by eliminating small variance eigenvalues and their vectors. With PCA, we center the data first by subtracting the mean. Each singular value tells us how much of the variance of a matrix (e.g. image) is captured in each component. In this problem, we will investigate how PCA can be used to improve features for regression and classification tasks and how the data itself affects the behavior of PCA.

Implement PCA in the below cell.

Assume a dataset is composed of N datapoints, each of which has D features with D < N. The dimension of our data would be D. It is possible, however, that many of these dimensions contain redundant information. Each feature explains part of the variance in our dataset. Some features may explain more variance than others.

In the following cell complete the PCA class by completing functions fit, transform and transform_rv.

In [7]:
class PCA(object):

    def __init__(self):
        self.U = None
        self.S = None
        self.V = None

    def fit(self, X): # 5 points
        """
        Decompose dataset into principal components.
        You may use your SVD function from the previous part in your implementation or numpy.linalg.svd function.

        Don't return anything. You can directly set self.U, self.S and self.V declared in __init__ with
        corresponding values from PCA.

        Args:
            X: N*D array corresponding to a dataset
        Return:
            None
        """
        mu = np.mean(X, axis = 0, keepdims = True)
        X = X - mu
        U, S, V = np.linalg.svd(X, full_matrices = False)
        self.U = U
        self.S = S
        self.V = V
        
#         raise NotImplementedError

    def transform(self, data, K=2): # 2 pts
        """
        Transform data to reduce the number of features such that final data has given number of columns

        Utilize self.U, self.S and self.V that were set in fit() method.

        Args:
            data: N*D array corresponding to a dataset
            K: Int value for number of columns to be kept
        Return:
            X_new: N*K array corresponding to data obtained by applying PCA on data
        """
        self.fit(data)
        X_new = np.dot(data, self.V.T[:, :K])
        return X_new
        
#         raise NotImplementedError

    def transform_rv(self, data, retained_variance=0.99): # 3 pts
        """
        Transform data to reduce the number of features such that a given variance is retained

        Utilize self.U, self.S and self.V that were set in fit() method.

        Args:
            data: N*D array corresponding to a dataset
            retained_variance: Float value for amount of variance to be retained
        Return:
            X_new: N*K array corresponding to data obtained by applying PCA on data
        """
        self.fit(data)
        var = np.cumsum(self.S ** 2)
        var = var / var[-1]

        N = data.shape[0]

        for i in range(N):
            if var[i] > retained_variance:
                break

        X_new = np.dot(data, self.V.T[:, :i + 1])
        return X_new        

#         raise NotImplementedError

    def get_V(self):
        """ Getter function for value of V """
        
        return self.V

2.2 Visualize [5 pts] **[W]**

PCA is used to transform multivariate data tables into smaller sets so as to observe the hidden trends and variations in the data. Here you will visualize two datasets (iris and wine) using PCA. Use the above implementation of PCA and reduce the datasets such that they contain only two features. Make 2-D scatter plots of the data points using these features. Make sure to differentiate the data points according to their true labels. The datasets have already been loaded for you. In addition, return the retained variance obtained from the reduced features.

In [8]:
def visualize(X,y): # 5 pts
    """
    Args:
        xtrain: NxD numpy array, where N is number of instances and D is the dimensionality of each instance
        ytrain: numpy array (N,), the true labels
  
    Return:
        None
    """
    N = X.shape[0]
    
    #do dimensionality reduction
    obj = PCA()
    x_new = obj.transform(X, 2)
    
    #add y values as column to xnew    
    y_new = y.reshape((N, 1))
    x = np.hstack((x_new, y_new))
    
    #make scatter plot
    plot = plt.scatter(x[:, 0], x[:, 1], c = x[:, 2], marker = 'x')
    
#     raise NotImplementedError
In [9]:
# HELPER CELL, DO NOT MODIFY
#Use PCA for visualization of iris and wine data
iris_data = load_iris(return_X_y=True)

X = iris_data[0]
y = iris_data[1]

visualize(X, y)
In [10]:
# HELPER CELL, DO NOT MODIFY
wine_data = load_wine(return_X_y=True)

X = wine_data[0]
y = wine_data[1]

visualize(X, y)

Now you will use PCA on an actual real-world dataset. We will use your implementation of PCA function to reduce the dataset with 99% retained variance and use it to obtain the reduced features. On the reduced dataset, we will use logistic or linear regression and compare results between PCA and non-PCA datasets. Run the following cells to see how PCA works on regression and classification tasks.

In [11]:
# HELPER CELL, DO NOT MODIFY
#load the dataset 
iris = load_iris()

X = iris.data
y = iris.target

print("data shape before PCA ",X.shape)

pca = PCA()
pca.fit(X)

X_pca = pca.transform_rv(X)

print("data shape with PCA ",X_pca.shape)
data shape before PCA  (150, 4)
data shape with PCA  (150, 3)
In [12]:
# HELPER CELL, DO NOT MODIFY
# Train, test splits
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, 
                                                    stratify=y, 
                                                    random_state=42)

# Use logistic regression to predict classes for test set
clf = LogisticRegression()
clf.fit(X_train, y_train)
preds = clf.predict_proba(X_test)
print('Accuracy: {:.5f}'.format(accuracy_score(y_test, 
                                                preds.argmax(axis=1))))
Accuracy: 0.91111
In [13]:
# HELPER CELL, DO NOT MODIFY
# Train, test splits
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=.3, 
                                                    stratify=y, 
                                                    random_state=42)

# Use logistic regression to predict classes for test set
clf = LogisticRegression()
clf.fit(X_train, y_train)
preds = clf.predict_proba(X_test)
print('Accuracy: {:.5f}'.format(accuracy_score(y_test, 
                                                preds.argmax(axis=1))))
Accuracy: 0.91111
In [14]:
# HELPER CELL, DO NOT MODIFY
def apply_regression(X_train, y_train, X_test):
    ridge = Ridge()
    weight = ridge.fit(X_train, y_train)
    y_pred = ridge.predict(X_test)
    
    return y_pred
In [15]:
# HELPER CELL, DO NOT MODIFY
#load the dataset 
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

print(X.shape, y.shape)

pca = PCA()
pca.fit(X)

X_pca = pca.transform_rv(X)
print("data shape with PCA ",X_pca.shape)
(442, 10) (442,)
data shape with PCA  (442, 8)
In [16]:
# HELPER CELL, DO NOT MODIFY
# Train, test splits
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=42)

#Ridge regression without PCA
y_pred = apply_regression(X_train, y_train, X_test)

# calculate RMSE 
rmse_score = np.sqrt(mean_squared_error(y_pred, y_test))
print("rmse score without PCA",rmse_score)
rmse score without PCA 55.79391924562032
In [17]:
# HELPER CELL, DO NOT MODIFY
#Ridge regression with PCA
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=.3, random_state=42)

#use Ridge Regression for getting predicted labels
y_pred = apply_regression(X_train,y_train,X_test)

#calculate RMSE 
rmse_score = np.sqrt(mean_squared_error(y_pred, y_test))
print("rmse score with PCA",rmse_score)
rmse score with PCA 55.78489213087043

For both the tasks above we see an improvement in performance by reducing our dataset with PCA.

Feel free to add other datasets in cell below and play around with what kind of improvement you get with using PCA. There are no points for playing around with other datasets.

In [18]:
######## YOUR CODE BELOW ########

#################################

3 Polynomial regression and regularization [55 pts + 20 pts bonus for CS 4641] **[W]** | **[P]**

3.1 Regression and regularization implementations [30 pts + 20 pts bonus for CS 4641] **[P]**

We have three methods to fit linear and ridge regression models: 1) close form; 2) gradient descent (GD); 3) Stochastic gradient descent (SGD). For undergraduate students, you are required to implement the closed form for linear regression and for ridge regression, the others 4 methods are bonus parts. For graduate students, you are required to implement all of them. We use the term weight in the following code. Weights and parameters (θθ) have the same meaning here. We used parameters (θθ) in the lecture slides.

In [19]:
class Regression(object):
    
    def __init__(self):
        pass
    
    def rmse(self, pred, label): # [5pts]
        '''
        This is the root mean square error.
        Args:
            pred: numpy array of length N * 1, the prediction of labels
            label: numpy array of length N * 1, the ground truth of labels
        Return:
            a float value
        '''
        rmse = np.sqrt(np.mean((pred - label) ** 2))
        return rmse
        
#         raise NotImplementedError
    
    def construct_polynomial_feats(self, x, degree): # [5pts]
        """
        Args:
            x: numpy array of length N, the 1-D observations
            degree: the max polynomial degree
        Return:
            feat: numpy array of shape Nx(degree+1), remember to include 
            the bias term. feat is in the format of:
            [[1.0, x1, x1^2, x1^3, ....,],
             [1.0, x2, x2^2, x2^3, ....,],
             ......
            ]
        """
        N = x.shape[0]
        feat = np.ones((N))
        for i in range(1, degree + 1):
            x_term = np.power(x, i)
            temp = np.column_stack((feat, x_term))
            feat = temp
        return feat
        
#         raise NotImplementedError


    def predict(self, xtest, weight): # [5pts]
        """
        Args:
            xtest: NxD numpy array, where N is number 
                   of instances and D is the dimensionality of each 
                   instance
            weight: Dx1 numpy array, the weights of linear regression model
        Return:
            prediction: Nx1 numpy array, the predicted labels
        """
        prediction = np.dot(xtest, weight)
        return prediction
        
#         raise NotImplementedError
        
    # =================
    # LINEAR REGRESSION
    # Hints: in the fit function, use close form solution of the linear regression to get weights. 
    # For inverse, you can use numpy linear algebra function  
    # For the predict, you need to use linear combination of data points and their weights (y = theta0*1+theta1*X1+...)

    def linear_fit_closed(self, xtrain, ytrain): # [5pts]
        """
        Args:
            xtrain: N x D numpy array, where N is number of instances and D is the dimensionality of each instance
            ytrain: N x 1 numpy array, the true labels
        Return:
            weight: Dx1 numpy array, the weights of linear regression model
        """
        # term1
        t1 = np.linalg.pinv(np.dot(xtrain.T, xtrain))
        # term2
        t2 = np.dot(t1, xtrain.T)
        weight = np.dot(t2, ytrain)
        return weight
        
#         raise NotImplementedError

    def linear_fit_GD(self, xtrain, ytrain, epochs=5, learning_rate=0.001): # [5pts]
        """
        Args:
            xtrain: NxD numpy array, where N is number 
                    of instances and D is the dimensionality of each 
                    instance
            ytrain: Nx1 numpy array, the true labels
        Return:
            weight: Dx1 numpy array, the weights of linear regression model
        """
        N, D = xtrain.shape[0], xtrain.shape[1]
        weight = np.zeros((D, 1))
        for i in range(epochs):
            temp = weight + learning_rate * (np.dot(xtrain.T, (ytrain - np.dot(xtrain, weight)))) / N
            weight = temp
        return weight
        
#         raise NotImplementedError

    def linear_fit_SGD(self, xtrain, ytrain, epochs=100, learning_rate=0.001): # [5pts]
        """
        Args:
            xtrain: NxD numpy array, where N is number 
                    of instances and D is the dimensionality of each 
                    instance
            ytrain: Nx1 numpy array, the true labels
        Return:
            weight: Dx1 numpy array, the weights of linear regression model
        """
        N, D = xtrain.shape[0], xtrain.shape[1]
        weight = np.zeros((D, 1))
        for i in range(epochs):
            t1 = ytrain - np.dot(xtrain, weight)
            idx = i % N
            t2 = np.dot(xtrain[idx, :].T.reshape((D, 1)), t1[idx].reshape((1, 1)))
            temp = weight + learning_rate * t2
            weight = temp
        return weight
        
#         raise NotImplementedError
        
    # =================
    # RIDGE REGRESSION
        
    def ridge_fit_closed(self, xtrain, ytrain, c_lambda): # [5pts]
        """
        Args:
            xtrain: N x D numpy array, where N is number of instances and D is the dimensionality of each instance
            ytrain: N x 1 numpy array, the true labels
            c_lambda: floating number
        Return:
            weight: Dx1 numpy array, the weights of ridge regression model
        """
        N, D = xtrain.shape[0], xtrain.shape[1]
        ide = np.identity(D)
        ide[0][0] = 0.0           
        # term1
        t1 = np.linalg.pinv(np.dot(xtrain.T, xtrain) + c_lambda * ide)
        # term2
        t2 = np.dot(t1, xtrain.T)
        weight = np.dot(t2, ytrain)
        return weight
        
#         raise NotImplementedError

        
    def ridge_fit_GD(self, xtrain, ytrain, c_lambda, epochs=500, learning_rate=1e-7): # [5pts]
        """
        Args:
            xtrain: NxD numpy array, where N is number 
                    of instances and D is the dimensionality of each 
                    instance
            ytrain: Nx1 numpy array, the true labels
            c_lambda: floating number
        Return:
            weight: Dx1 numpy array, the weights of linear regression model
        """
        N, D = xtrain.shape[0], xtrain.shape[1]
        weight = np.zeros((D, 1))
        for i in range(epochs):
            #term 1
            t1 = np.dot(xtrain.T, (ytrain - np.dot(xtrain, weight)))
            temp = weight + learning_rate * (t1 + c_lambda * weight) / N
            weight = temp
        return weight
        
#         raise NotImplementedError

    def ridge_fit_SGD(self, xtrain, ytrain, c_lambda, epochs=100, learning_rate=0.001): # [5pts]
        """
        Args:
            xtrain: NxD numpy array, where N is number 
                    of instances and D is the dimensionality of each 
                    instance
            ytrain: Nx1 numpy array, the true labels
        Return:
            weight: Dx1 numpy array, the weights of linear regression model
        """
        N, D = xtrain.shape[0], xtrain.shape[1]
        weight = np.zeros((D, 1))
        for i in range(epochs):
            t1 = ytrain - np.dot(xtrain, weight)
            idx = i % N
            t2 = np.dot(xtrain[idx, :].T.reshape((D, 1)), t1[idx].reshape((1, 1)))
            temp = weight + learning_rate * (t2 + c_lambda * weight)
            weight = temp
        return weight
        
#         raise NotImplementedError

    def ridge_cross_validation(self, X, y, kfold=10, c_lambda=100): # [5 pts]
        """
        Args: 
            X : NxD numpy array, where N is the number of instances and D is the dimensionality of each instance
            y : Nx1 numpy array, true labels
            kfold: Number of folds you should take while implementing cross validation.
            c_lambda: Value of regularization constant
        Returns:
            meanErrors: Float average rmse error
        Hint: np.concatenate might be helpful.
        Look at 3.5 to see how this function is being used.
        # For cross validation, use 10-fold method and only use it for your training data (you already have the train_indices to get training data).
        # For the training data, split them in 10 folds which means that use 10 percent of training data for test and 90 percent for training.
        """
        N, D = X.shape[0], X.shape[1]
        size_of_fold = int(N / kfold)
        meanErrors = 0.0
        for i in range(kfold):
            # train model on every fold except ith
            xtrain1 = X[:i * size_of_fold, :]
            xtrain2 = X[(i + 1) * size_of_fold:, :]
            xtrain = np.concatenate((xtrain1, xtrain2))
            ytrain1 = y[:i * size_of_fold]
            ytrain2 = y[(i + 1) * size_of_fold:]
            ytrain = np.concatenate((ytrain1, ytrain2))
            weight = self.ridge_fit_closed(xtrain, ytrain, c_lambda)

            # compute error on ith fold
            xtest = X[i * size_of_fold:(i + 1) * size_of_fold, :]
            ytest = y[i * size_of_fold:(i + 1) * size_of_fold]
            pred = self.predict(xtest, weight)
            error = self.rmse(pred, ytest)

            # add to error
            meanErrors += error

        meanErrors /= kfold
        return meanErrors
            
        
#         raise NotImplementedError

3.2 About RMSE [3 pts] **[W]**

Do you know whether this RMSE is good or not? If you don’t know, we could normalize our labels between 0 and 1. After normalization, what does it mean when RMSE = 1?

Hint: think of the way that you can enforce your RMSE = 1. Note that you can not change the actual labels to make RMSE = 1.

Although we know that we desire a small RMSE value for a good regression model, it might not be the best indicator of a good model just by itself. However, if we normalize our data and labels, the RMSE value should fall between 0 and 1, in which case it is easy to get an idea of whether we have a good model at hand. In a normalized case, RMSE = 1 would mean a very high error, which is not desirable, and the model is not good.

3.3 Testing: general functions and linear regression [5 pts] **[W]**

Let’s first construct a dataset for polynomial regression.

In this case, we construct the polynomial features up to degree 5.. Each data sample consists of two features [a,b][a,b]. We compute the polynomial features of both a and b in order to yield the vectors [1,a,a2,a3,...adegree][1,a,a2,a3,…adegree] and [1,b,b2,b3,...,bdegree][1,b,b2,b3,…,bdegree]. We train our model with the cartesian product of these polynomial features. The cartesian product generates a new feature vector consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.

For example, if degree = 2, we will have the polynomial features [1,a,a2][1,a,a2] and [1,b,b2][1,b,b2] for the datapoint [a,b][a,b]. The cartesian product of these two vectors will be [1,a,b,ab,a2,b2][1,a,b,ab,a2,b2]. We do not generate a3a3 and b3b3 since their degree is greater than 2 (specified degree).

In [20]:
#helper, do not need to change
POLY_DEGREE = 5
NUM_OBS = 1000

rng = np.random.RandomState(seed=4)

true_weight = -rng.rand((POLY_DEGREE)**2+2, 1)
true_weight[2:, :] = 0
x_all1 = np.linspace(-5, 5, NUM_OBS)
x_all2 = np.linspace(-3, 3, NUM_OBS)
x_all = np.stack((x_all1,x_all2), axis=1)

reg = Regression()
x_all_feat1 = reg.construct_polynomial_feats(x_all[:,0], POLY_DEGREE)
x_all_feat2 = reg.construct_polynomial_feats(x_all[:,1], POLY_DEGREE)

x_cart_flat = []
for i in range(x_all_feat1.shape[0]):
    x1 = x_all_feat1[i]
    x2 = x_all_feat2[i]
    x1_end = x1[-1]
    x2_end = x2[-1]
    x1 = x1[:-1]
    x2 = x2[:-1]
    x3 = np.asarray([[m*n for m in x1] for n in x2])

    x3_flat = np.reshape(x3, (x3.shape[0]**2))
    x3_flat = list(x3_flat)
    x3_flat.append(x1_end)
    x3_flat.append(x2_end)
    x3_flat = np.asarray(x3_flat)
    x_cart_flat.append(x3_flat)
    
x_cart_flat = np.asarray(x_cart_flat)
x_all_feat = np.copy(x_cart_flat)

y_all = np.dot(x_cart_flat, true_weight) + rng.randn(x_all_feat.shape[0], 1) # in the second term, we add noise to data
print(x_all_feat.shape, y_all.shape)

# Note that here we try to produce y_all as our training data
#plot_curve(x_all, y_all) # Data with noise that we are going to predict
#plot_curve(x_all, np.dot(x_cart_flat, true_weight), curve_type='-', color='r', lw=4) # the groundtruth information

indices = rng.permutation(NUM_OBS)