diff --git a/README.md b/README.md index e367f8d9a0b74001aee6853828190e5598d12f8e..4fe6540397f47bfa5fced601c66eff644b98cec4 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Project 5 +# Project 5/optional **(Due Dec 10, midnight, *no late submissions*)** Project 5 focuses on using Apache Spark for doing large-scale data analysis tasks. For this assignment, we will use relatively small datasets and we won't run anything in distributed mode; however Spark can be easily used to run the same programs on much larger datasets. @@ -138,11 +138,9 @@ The following is then executed repeatadly till currentMatching does not change. * Now we are left with a set of user-product relationships that we can add to currentMatching and iterate ### Sample results.txt File -You can use spark-submit to run the `assignment.py` file, but it would be easier to develop with pyspark (by copying the commands over). We will also shortly post iPython instructions. - **results.txt** shows the results of running assignment.py on our code using: `$SPARKHOME/bin/spark-submit assignment.py` ### Submission -Submit the `functions.py` file. +Submit the `functions.py` file [here](https://myelms.umd.edu/courses/1227917/assignments/4530807).