[SOLVED] DATA607 Assignment 2

30.00 $

Category:

Description

5/5 - (1 vote)

Choose six recent popular movies. Ask at least five people that you know (friends, family, classmates, imaginary

friends if necessary) to rate each of these movies that they have seen on a scale of 1 to 5. Take the results

(observations) and store them in a SQL database of your choosing. Load the information from the SQL database

into an R dataframe.

This is by design a very open-ended assignment. In general, there’s no need here to ask “Can I…?” questions

about your proposed approach. A variety of reasonable approaches are acceptable. You could for example

access the SQL data directly from R, or you could create an intermediate .CSV file. I should be able to generate

the SQL table(s) and data from your provided code—if you use a graphical user interface to create and populate

tables, it should have a mechanism to generate corresponding SQL code.

This assignment does not need to be 100% reproducible. You can (and should) blank out your SQL password if

your solution requires it; otherwise, full credit requires that your code is “reproducible,” with the assumption

that I have the same database server and R software.

Handling missing data is a foundational skill when working with SQL or R. To receive full credit, you should

demonstrate a reasonable approach for handling missing data. After all, how likely is it that all five of your

friends have seen all six movies?

You’re encouraged to optionally find other ways to make your solution better. For example, consider

incorporating one or more of the following suggestions into your solution:

  • Use survey software to gather the information.
  • Are you able to use a password without having to share the password with people who are viewing your

code? There are a lot of interesting approaches that you can uncover with a little bit of research.

  • While it’s acceptable to create a single SQL table, can you create a normalized set of tables that

corresponds to the relationship between your movie viewing friends and the movies being rated?

  • Is there any benefit in standardizing ratings? How might you approach this?

You should post any code (e.g. SQL and R Markdown) in a GitHub repository, and provide a link in your

assignment submission. For this assignment, you are not required to post your code to rpubs.com.

You may work in a small group on this assignment. If you work in a group, each group member should indicate

who they worked with, and all group members should individually submit their week 2 assignment.

Please start early, and do work that you would want to include in a “presentations portfolio” that you might

share in a job interview with a potential employer! You are encouraged to share thoughts, ask, and answer

clarifying questions in this week’s “R and SQL” forum.

(Optional) Reading related to this assignment

  • James Le, “The 4 Recommendation Engines That Can Predict Your Movie Tastes”, May 1, 2018.

https://towardsdatascience.com/the-4-recommendation-engines-that-can-predict-your-movie-tastes-

109dc4e10c52 This a nice backgrounder on movie recommendation engines. We’ll learn more about

recommender systems later in the course.

  • Steve Blank, “The Customer Development Process. 2 Minutes to See Why”, Jul 29, 2014.

https://www.youtube.com/watch?v=xr2zFXblSRM&t=27s. In this [<3 minute] YouTube video “lean

startup” founder Steve Blank talks about the importance of getting out of the building to talk to

customers. I’d encourage you to adopt this “builder mentality” in your own data science work

whenever it’s practical, by collecting data yourself, whether it’s related to a “business experiment” or a

“scientific experiment.”