Description
Machine Learning and Data Mining
K-Means Clustering
Problem The Iris dataset contains 150 data samples of three Iris categories, labeled by outcome values 0, 1, and 2. Each data sample has four attributes: sepal length, sepal width, petal length, and petal width.
Here is the code snippet to load the dataset. Note: since K-means clustering is unsupervised learning, we don’t need to split the data into a training set and a test set.
from sklearn import datasets iris = datasets.load_iris() print(list(iris.keys()))
print(iris.feature_names)
X = iris.data # each row is a sample y=iris.target # target labels
Implement the K-means clustering algorithm to group the samples into K=3 clusters. Initialize the cluster centers by the first 3 data samples. The objective function to minimize is defined as: 𝐽 =
∑𝑁𝑛=1∑𝐾𝑘=1𝑟𝑘𝑛‖𝐦𝑘 −𝐱𝑛‖22. Each iteration includes an assignment step and a cluster-center update step.
Calculate the objective function value 𝐽 after the assignment step in each iteration. Exit the iterations if the following criterion is met: 𝐽(Iter−1)−𝐽(Iter) < ε, where ε = 10−5, and Iter is the iteration number. Plot the objective function value 𝐽 versus the iteration number Iter. Comment on the result.



