Description
Question 1
-
- Read the data into a data frame. (3pts)
- Display all records sorted by the PM2.5 values. (3pts)
- Calculate and show how many devices in each city. (7pts)
- Display the average PM2.5 values (over the whole time span) of each device and sort them by the PM2.5 values. You can identity which device location with the best/worst PM2.5 quality and observe some unreasonable measurements. (7pts)
- Display the average PM2.5 values (over the whole time span) of each city and sort them by the PM2.5 values. You can identity which city has the best/worst air quality. (10pts)
- Calculate the average PM2.5 of each day within each city. List the results by sorting average PM2.5 values in an ascending order within each city, i.e. you should put average PM2.5 values of the same city together. (We do not mind the order among cities). You can identity which day has the best/worst air quality in each city. (10pts)
- Calculate average PM 2.5 values (all devices in Taiwan) of each hour on Friday and do the same thing for Saturday. List the results by a table. The table has three columns whose names should be “hour”, “Friday_PM2.5” and “Saturday_PM2.5”. The result should be sorted by “hour” in an ascending order You can observe the difference of air quality change over time between Friday and Saturday. (10pts)
- Question 2
- Is the PM2.5 related to PM1.0? You should choose and create a visualization to support your answer. (15pts)
- Among Taipei, Pingtung, Nantou and Taichuang, which city has the most significant difference PM2.5 change pattern over the whole day on July 24 from the other three cities? You should choose and create a visualization to support your answer. (15pts)
- (Assume we only consider PM2.5 to evaluate the air quality) Please implement the following steps to answer which city (Taipei or Tainan) has a better air quality. (20pts)
- Remove all records whose PM2.5 is 0 (simple data cleaning process to remove impossible data values)
- Collect all records from Taipei and draw a histogram to shows how many hour PM2.5 records in each histogram bin interval. Set the histogram bin count to 100, the min value to 0 and max value to 100 to create the histogram.
- Repeat the above step to draw a histogram for Tainan.
- Observe these two histograms to answer which city has a better air quality in the time span of the dataset and explain your answer.


