From 0e319f8f4b9a51d0e8cafe8bd91dfe2e7035588b Mon Sep 17 00:00:00 2001
From: "Peter J. Keleher" <keleher@cs.umd.edu>
Date: Sat, 2 Dec 2023 07:45:45 -0500
Subject: [PATCH] auto

---
 assign9.md | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/assign9.md b/assign9.md
index e6b592d..6980a35 100644
--- a/assign9.md
+++ b/assign9.md
@@ -82,7 +82,7 @@ a bunch of stuff about what Spark is doing). The relevant variables are
 initialized in this python shell, but otherwise it is just a standard Python
 shell.
 
-2. `>>> textFile = sc.textFile("README.md")`: This creates a new RDD, called
+2. `>>> textFile = sc.textFile("Dockerfile")`: This creates a new RDD, called
 `textFile`, by reading data from a local file. The `sc.textFile` commands create
 an RDD containing one entry per line in the file.
 
@@ -97,12 +97,37 @@ the Word Count application.
 #### Word Count Application
 
 The following command (in the pyspark shell) does a word count, i.e., it counts
-the number of times each word appears in the file `README.md`. Use
+the number of times each word appears in the file `Dockerfile`. Use
 `counts.take(5)` to see the output.
 
 `>>> counts = textFile.flatMap(lambda line: line.split(" ")).map(lambda word:
 (word, 1)).reduceByKey(lambda a, b: a + b)`
 
+In more detail:
+```
+root@d36910b1feb0:/assign9# $SPARKHOME/bin/pyspark
+Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+Setting default log level to "WARN".
+To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
+23/12/02 12:35:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+Welcome to
+      ____              __
+     / __/__  ___ _____/ /__
+    _\ \/ _ \/ _ `/ __/  '_/
+   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
+      /_/
+
+Using Python version 3.10.12 (main, Jun 11 2023 05:26:28)
+Spark context Web UI available at http://d36910b1feb0:4040
+Spark context available as 'sc' (master = local[*], app id = local-1701520517201).
+SparkSession available as 'spark'.
+>>> textFile = sc.textFile("Dockerfile")
+>>> counts = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
+>>> counts.take(5)
+[('#', 9), ('Use', 1), ('as', 1), ('image', 1), ('', 35)]
+```
+
 Here is the same code without the use of `lambda` functions.
 
 ```
-- 
GitLab