Skip to content
Snippets Groups Projects
Commit 0e319f8f authored by Peter J. Keleher's avatar Peter J. Keleher
Browse files

auto

parent be186a57
No related branches found
No related tags found
No related merge requests found
......@@ -82,7 +82,7 @@ a bunch of stuff about what Spark is doing). The relevant variables are
initialized in this python shell, but otherwise it is just a standard Python
shell.
2. `>>> textFile = sc.textFile("README.md")`: This creates a new RDD, called
2. `>>> textFile = sc.textFile("Dockerfile")`: This creates a new RDD, called
`textFile`, by reading data from a local file. The `sc.textFile` commands create
an RDD containing one entry per line in the file.
......@@ -97,12 +97,37 @@ the Word Count application.
#### Word Count Application
The following command (in the pyspark shell) does a word count, i.e., it counts
the number of times each word appears in the file `README.md`. Use
the number of times each word appears in the file `Dockerfile`. Use
`counts.take(5)` to see the output.
`>>> counts = textFile.flatMap(lambda line: line.split(" ")).map(lambda word:
(word, 1)).reduceByKey(lambda a, b: a + b)`
In more detail:
```
root@d36910b1feb0:/assign9# $SPARKHOME/bin/pyspark
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/12/02 12:35:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.5.0
/_/
Using Python version 3.10.12 (main, Jun 11 2023 05:26:28)
Spark context Web UI available at http://d36910b1feb0:4040
Spark context available as 'sc' (master = local[*], app id = local-1701520517201).
SparkSession available as 'spark'.
>>> textFile = sc.textFile("Dockerfile")
>>> counts = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
>>> counts.take(5)
[('#', 9), ('Use', 1), ('as', 1), ('image', 1), ('', 35)]
```
Here is the same code without the use of `lambda` functions.
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment