[SOLVED] CIND110-Assignment 3

30.99 $

Category:

Description

5/5 - (1 vote)

1.  Association rules:

One of the major techniques in data mining involves the discovery of association rules. These rules correlate the presence of a set of items with another range of values for another set of variables. The database in this context is regarded as a collection of transactions, each involving a set of items, as shown below.

Trans ID      Items Purchased

2001               Meat, Potato, Onion

2002               Meat, Noodle

2003               Noodle, Spinach

2004               Meat, Potato, Onion

2005               Onion, Potato, Noodle

2006               Eggs, Spinach

2007               Eggs, Noodle

2008               Meat, Potato, Salt, Onion

2009               Salt, Spinach

2010               Meat, Potato

1.1                                                                                                                                                                                                Apply the Apriori algorithm on this dataset.

Note that, the set of items is {Meat, Potato, Onion, Noodle, Spinach, Eggs, Salt}.  You may use 0.3 for the minimum support value.

1.2                                                                                                                                                                                                Show the rules that have a confidence of 0.8 or greater for an itemset containing three items.

2.  Classification:

Classification is the process of learning a model that describes different classes of data and

the classes should be pre-determined. Consider the following set of data records:

ID
Age
City
Gender
Education
Profile

101

20-30

NY
F

College

Employed
102
31-40
NY
F
College
Employed
103
51-60
NY
F
College
Unemployed
104
20-30
LA
M
High School
Unemployed
105
41-50
NY
F
College
Employed
106
41-50
NY
F
Graduate
Employed
107
20-30
LA
M
College
Employed
108
20-30
NY
F
High School
Unemployed
109
20-30
NY
F
College
Employed
110         51-60          SF            M        College                       Unemployed

Assuming, that the class attribute is Profile, apply a classification algorithm to this dataset.

3.  Clustering: Consider the following set of two-dimensional records:

RID                  Age                     Years of Service

101                                 30    5

102                                 50    25

103                                 50    15

104                                 25    5

105                                 30    10

106                                 55    25

3.1

Use the K-means algorithm to cluster this dataset. You can use a value of 2 for K and can assume that the records with RIDs 103, and 104 are used for the initial cluster centroids.

3.2

What is the difference between describing discovered knowledge using clustering and describing it using classification?