Skip to content
Snippets Groups Projects
Commit f33caf26 authored by Peter Keleher's avatar Peter Keleher
Browse files

auto

parent 1e3cc434
No related branches found
No related tags found
No related merge requests found
...@@ -6,7 +6,7 @@ tasks. For this assignment, we will use relatively small datasets and we won't ...@@ -6,7 +6,7 @@ tasks. For this assignment, we will use relatively small datasets and we won't
run anything in distributed mode; however Spark can be easily used to run the run anything in distributed mode; however Spark can be easily used to run the
same programs on much larger datasets. same programs on much larger datasets.
### Setup ## Setup
Download files for Assignment 9 <a href="https://ceres.cs.umd.edu/424/assign/assignment9Dist.tgz?1">here</a>. Download files for Assignment 9 <a href="https://ceres.cs.umd.edu/424/assign/assignment9Dist.tgz?1">here</a>.
...@@ -27,9 +27,6 @@ tasks are written as chains of these operations. ...@@ -27,9 +27,6 @@ tasks are written as chains of these operations.
Spark can be used with the Hadoop ecosystem, including the HDFS file system and Spark can be used with the Hadoop ecosystem, including the HDFS file system and
the YARN resource manager. the YARN resource manager.
Note that bash is the default shell everywhere, but the `.cshrc` is set up
correctly if you feel like dropping into `tcsh`.
### Vagrant ### Vagrant
This is the **recommended** way to do this project. This is the **recommended** way to do this project.
...@@ -56,7 +53,8 @@ approach has been checked out (soon). ...@@ -56,7 +53,8 @@ approach has been checked out (soon).
### Docker ### Docker
Probably before Thanksgiving. Probably before Thanksgiving.
### Spark and Python
## Spark and Python
Spark primarily supports three languages: Scala (Spark is written in Scala), Spark primarily supports three languages: Scala (Spark is written in Scala),
Java, and Python. We will use Python here -- you can follow the instructions at Java, and Python. We will use Python here -- you can follow the instructions at
...@@ -118,7 +116,7 @@ The `lambda` representation is more compact and preferable, especially for small ...@@ -118,7 +116,7 @@ The `lambda` representation is more compact and preferable, especially for small
functions, but for large functions, it is better to separate out the functions, but for large functions, it is better to separate out the
definitions. definitions.
### Running it as an Application ### Running as an Application
Instead of using a shell, you can also write your code as a python file, and Instead of using a shell, you can also write your code as a python file, and
*submit* that to the spark cluster. The `assignment9` directory contains a *submit* that to the spark cluster. The `assignment9` directory contains a
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment