Name: inf264 - Homework 1 - k-NN for a classification problem on the Iris dataset- Solved
SKU: 79278
Availability: InStock

Description

5/5 - (1 vote)

Iris is a small dataset consisting of 150 vectors describing iris flowers, split into three different classes representing three species of the iris family. Each vector comes with a label (the name of the species) and a set of four features which are measurements of different parts of the flower.

Left: The three species in the Iris dataset

Right: The four features in the Iris dataset (petal and sepal width and length)

Those measurements tend to differ between the different species, thus it is possible to train and evaluate a classifier from this dataset whose task is to predict the species of an iris flower represented by aforementioned set of features. In this exercice we will use k-NN classifier.

Iris Dataset:
- Load the Iris dataset directly from sklearn. You can alternatively download the dataset here: https://archive.ics.uci.edu/ml/datasets/iris .
- Store the first 2 features (sepal length and sepal width) in a matrix X and labels in a vector Y .
- Split the dataset into 3 datasets: training set, validation set and a testing set, i.e. split X and Y into X_train, X_val, X_testand Y_train, Y_val, Y_test You can for instance use a train/validation/test ratio of 0.7/0.15/0.15.
Perform a k-NN classification of your dataset for each k in 1,5,10,20,30:
- Plot both training and validation Iris datapoints with respect to the two selected features. Since there are three classes, you will need three different colors.
- Create an instance of the KNeighborsClassifier class
- Train your instance of k-nn on your training data set
- Plot the decision boundaries as decided by the trained k-nn.

Compute model accuracy on training dataset and validation dataset
Which model (i.e which k) would you select? Compute model accuracy on testing dataset 3. Interpretation:

Plot a curve representing the training accuracy as a function of k and same for the validation accuracy.
From your observations, for which values of k does k-NN overfit ?
For k =1, k-NN train accuracy should be equal to 1 (100% correct predictions). Explain why this is not the case here.

[SOLVED] inf264 - Homework 1 - k-NN for a classification problem on the Iris dataset

If Helpful Share:

Description

Related products

inf264 -Homework 3- Model selection for regression –

inf264 -Project 2 -Predicting traffic –

inf264 -Project 1 – Implementing decision trees –

Related in this category

More in this category

inf264 -Homework 3- Model selection for regression –

inf264 – Homework 2 – Univariate linear regression with gradient descent –

inf264 -Project 1 – Implementing decision trees –

inf264 -Project 2 -Predicting traffic –