[SOLVED] Cloud Computing - Assignment

30.00 $

Category:

Description

Rate this product

 

 

Problem Scenario 1

The task at hand is the development of a collaborative platform for archival and sharing of class notes and student coursework. The platform should allow handling of:

Teacher accounts;

Groups of two/three students;

A repository of documents;

Appropriate measures against data loss;

Support for versioning of the stored documents;

The documents may be stored in source format (LaTeX, docx) or in PDF. Optionally, for source format submissions you can implement a solution to generate automatically the appropriate PDF file.

The collaboration site shall be implemented on the Amazon AWS platform.

It is suggested to use the DRUPAL software, www.drupal.org, for the deployment of the website. A tutorial on installing DRUPAL is available at https:

//www.tecmint.com/install-drupal-in-centos-rhel-fedora/. Two important additional install instructions.

At the very start, edit the file /etc/selinux/config so that SELINUX=disabled.

After drupal has started up and you have edited /etc/httpd/conf/httpd.conf, run the command systemctl restart httpd.

Using Drupal is not mandatory: you may implement the system using any tools you feel comfortable with.

 

Problem Scenario 2

The task at hand is to use the Spark infrastructure on AWS to solve at least one of the queries for the DEBS 2015 Grand Challenge https://debs.org/ grand-challenges/2015/. The dataset records a years worth of taxi trips for the city of New York. The website contains a link (Google Drive) to a subset of the data for testing purposes. Learn how to easily download Google

Drive files at https://stackoverflow.com/a/39225039, following these instructions:

pip3          install gdown

gdown https :// drive . google .com/uc?id=0B0TBL8JNn3JgTGNJTEJaQmFMbk0 sudo yum install          gzip . x86 64      /* if gunzip   is             not          installed */ gunzip             sorted data . csv . gz

The challenge proposes two queries:

Query 1 – Frequent Routes: Find the top 10 most frequent routes during a window covering the last 30 minutes;

Query 2 – Profitable areas: Find the top 10 most profitable areas during a window covering the last 30 minutes;

The specification “over the last 30 minutes” means that the solution is intended to be applied to a data stream, so that the system can be in principle applied to real-time monitoring of the network. For the submission requirement of this scenario, please see previous “You will have to turn in” section. For both queries it is assumed that the result is calculated in a streaming fashion – i.e.: (1) solutions must not make use of any pre-calculated information, such as indices and (2) result streams must be updated continuously. You are free to choose either one of the queries.