From b3dd94a110539d6d62863b4a11297d05442c70e9 Mon Sep 17 00:00:00 2001
From: Zhichao <liuzceecs@gmail.com>
Date: Wed, 23 Oct 2019 22:14:12 -0700
Subject: [PATCH] add note for Task5

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 80407c5..92bb686 100644
--- a/README.md
+++ b/README.md
@@ -115,6 +115,7 @@ We will use json.loads to parse the JSONs (this is already done). Make sure to l
 The format of the log entries should be self-explanatory, but here are more details if you need: [NASA Logs](http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html).
  
 - **Task 5 (4pt)**: Complete a function to group all ratings of products and calculate the degree distribution of product nodes in the Amazon graph. In other words, calculate the degree of each product node (i.e., number of ratings each product has gotten). Use a groupByKey to find the list of ratings each product has got and reduceByKey (or aggregateByKey) to find the degree of each product rating. The output should be a RDD where the key is the product, and the values are the degree and a list of all ratings the product has gotten. Make sure to make all the ratings a list and join the two RDDs.
+Note: In this question, both list of ratings and users can be accepted as the values in the answer RDD(the result.txt is taking users as the value)
  
 - **Task 6 (4pt)**: On the logsRDD, for two given times, use a 'cogroup' to create the following RDD: the key of the RDD will be a host, and the value will be a 2-tuple, where the first element is a list of all URLs fetched from that host before the first time, and the second element is the list of all URLs fetched from that host after the second time. Use filter to first create two RDDs from the input logsRDD.
 
-- 
GitLab