[SOLVED] CS412-Homework 2

20.99 $

Category:

Description

5/5 - (2 votes)

Problem 1.

Suppose the base cuboid of a data cube contains two cells:

(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10) : 1,(a1,b2,a3,b4,b5,b6,b7,b8,a9,b10) : 1 where ai 6= bi for any i.

  • How many cuboids are there in this data cube?
  • ) How many (nonempty) closed cells are there in this data cube?
  • How many (nonempty) aggregate cells are there in this data cube?
  • ) How many (nonempty) aggregate closed cells are there in this data cube?
  • If we set minimum support = 2, how many (nonempty) aggregate cells arethere in the corresponding iceberg cube?

Problem 2. )

We have the following data cube measures. Which of them are algebraic measures? Explain each of your selection and non-selection.

  • ) Standard deviation;
  • () Average of Min and Max;
  • ) Sum of the largest 50 values;
  • ) Sum of the largest values, where n is the number of data points in the current cuboid and bxc denotes the greatest integer less than or equal to x; (e) (5 points) Mode, if the data is guaranteed to be binary.

Problem 3.)

A database has 5 transactions listed in Table 1. Let min sup = 0.6 and min conf = 0.7.

Table 1:

trans id item bought
001 {H, A, D, B, C}
002 {D, A, E, F}
003 {C, D, B, E}
004 {B, A, C, H, D}
005 {B, G, C}
  • List the frequent k-itemset for the largest k;
  • (Show an itemset S, where (1) ∀S0 S (S0 6= ∅), S0 is frequent, and (2) S is not frequent;
  • (List all the closed patterns;
  • List all the max-patterns;
  • List all the strong association rules (with support and confidence) with thefollowing shape: x ∈{001,002,…,005}, buys(x,item1) ∧ buys(x,item2) ⇒ buys(x,item3). [s,c].
  • ) Now we want to mine frequent patterns using FP-Growth. We first find singleitem frequent patterns and sort them in frequency descending order. To break ties, we assume the order is B C D A. Construct the FP-tree;
  • ) Show A’s conditional (i.e., projected) database.

Problem 4. )

Suppose we are interested in only the frequent patterns satisfying certain constraints. For example, in Table 1, each item has its price. The price information is listed in Table 2. We still have min sup = 0.6.

Table 2:

item A B C D E F G H
price 10 20 40 30 90 90 30 50
  • Find all the frequent patterns S in Table 1 satisfying sum(price) ≥ 45 (i.e.,

the sum of the price of all the items in S is no less than 45);

  • Is the constraint sum(price) ≥ 45 monotonic or anti-monotonic? How about sum(S.price) ≤ 45? Can you find an efficient method to mine frequent patterns with

sum(S.price) ≤ 45?

  • Let us focus on two other constraints avg(price) ≥ 30 (i.e., the average price of all the items in S is no less than 30) and avg(S.price) ≤ 30. Are they convertible? If so, how to convert them to anti-monotonic cases?