assign9.md

root@d36910b1feb0:/assign9# $SPARKHOME/bin/pyspark
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/12/02 12:35:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
      /_/

Using Python version 3.10.12 (main, Jun 11 2023 05:26:28)
Spark context Web UI available at http://d36910b1feb0:4040
Spark context available as 'sc' (master = local[*], app id = local-1701520517201).
SparkSession available as 'spark'.
>>> textFile = sc.textFile("Dockerfile")
>>> counts = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
>>> counts.take(5)
[('#', 9), ('Use', 1), ('as', 1), ('image', 1), ('', 35)]
def split(line):
    return line.split(" ")
def generateone(word):
    return (word, 1)
def sum(a, b):
    return a + b

textfile.flatMap(split).map(generateone).reduceByKey(sum)
     	task2_result = nobelRDD.map(json.loads).flatMap(task2_flatmap)
silly
*HOLIDAY*
nice
guy
*HARDEN*
jerk
*HOLIDAY*
Again, a good guy