add note to task5

aa4e725d · Zhichao · 09fc2821 · aa4e725d
Commit aa4e725d authored 5 years ago by Zhichao
--- a/README.md
+++ b/README.md
@@ -114,7 +114,7 @@ We will use json.loads to parse the JSONs (this is already done). Make sure to l
 - **Task 4 (4pt)**: This function operates on the logsRDD. It takes as input a list of logs and returns an RDD where the key is the hosts and the value is the latest dates and time (no time zone) in the log when the hosts are visited.
 The format of the log entries should be self-explanatory, but here are more details if you need: [NASA Logs](http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html).
 
- **Task 5 (4pt)**: Complete a function to group all ratings of products and calculate the degree distribution of product nodes in the Amazon graph. In other words, calculate the degree of each product node (i.e., number of ratings each product has gotten). Use a groupByKey to find the list of ratings each product has got and reduceByKey (or aggregateByKey) to find the degree of each product rating. The output should be a RDD where the key is the product, and the values are the degree and a list of all ratings the product has gotten. Make sure to make all the ratings a list and join the two RDDs.
+- **Task 5 (4pt)**: Complete a function to group all ratings(or users) of products and calculate the degree distribution of product nodes in the Amazon graph. In other words, calculate the degree of each product node (i.e., number of ratings each product has gotten). Use a groupByKey to find the list of ratings(or users) each product has got and reduceByKey (or aggregateByKey) to find the degree of each product rating. The output should be a RDD where the key is the product, and the values are the degree and a list of all ratings(or users) the product has gotten. Make sure to make all the ratings a list and join the two RDDs.

 Note: In this question, both list of ratings and users can be accepted as the values in the answer RDD(the result.txt is taking users as the value). We highly recommend you to implement the solution with list of users, since the input of Task5 function is a tuple of (user, product), which means you don't need to make any change to assignment.py