Project 5 focuses on using Apache Spark for doing large-scale data analysis tasks. For this assignment, we will use relatively small datasets and we won't run anything in distributed mode; however Spark can be easily used to run the same programs on much larger datasets.
Project 5 focuses on using Apache Spark for doing large-scale data analysis tasks. For this assignment, we will use relatively small datasets and we won't run anything in distributed mode; however Spark can be easily used to run the same programs on much larger datasets.
...
@@ -138,11 +138,9 @@ The following is then executed repeatadly till currentMatching does not change.
...
@@ -138,11 +138,9 @@ The following is then executed repeatadly till currentMatching does not change.
* Now we are left with a set of user-product relationships that we can add to currentMatching and iterate
* Now we are left with a set of user-product relationships that we can add to currentMatching and iterate
### Sample results.txt File
### Sample results.txt File
You can use spark-submit to run the `assignment.py` file, but it would be easier to develop with pyspark (by copying the commands over). We will also shortly post iPython instructions.
**results.txt** shows the results of running assignment.py on our code using: `$SPARKHOME/bin/spark-submit assignment.py`
**results.txt** shows the results of running assignment.py on our code using: `$SPARKHOME/bin/spark-submit assignment.py`
### Submission
### Submission
Submit the `functions.py` file.
Submit the `functions.py` file[here](https://myelms.umd.edu/courses/1227917/assignments/4530807).