Name	Name	Last commit message	Last commit date
Latest commit mateiz Merge pull request #563 from jey/python-optimization Jun 22, 2013 7e4b266 · Jun 22, 2013 History 2,904 Commits
bagel	bagel	Attempt to fix streaming test failures after yarn branch merge	Apr 28, 2013
bin	bin	Revert "Fix start-slave not passing instance number to spark-daemon."	Jun 11, 2013
conf	conf	added SPARK_WORKER_INSTANCES : allows spawning multiple worker instan…	Mar 27, 2013
core	core	use parens when calling method with side-effects	Jun 21, 2013
docs	docs	Fixed a couple typos and formating problems in the YARN documentation.	May 18, 2013
ec2	ec2	Fix SPARK-670: EC2 start command should require -i option.	May 5, 2013
examples	examples	Merge remote-tracking branch 'milliondreams/casdemo'	Jun 18, 2013
project	project	Increase memory for tests to prevent a crash on JDK 7	Jun 22, 2013
python	python	Fix reporting of PySpark exceptions	Jun 21, 2013
repl-bin	repl-bin	Attempt at fixing merge conflict	Apr 24, 2013
repl	repl	Update ASM to version 4.0	Jun 19, 2013
sbt	sbt	Increase ReservedCodeCacheSize for sbt	Apr 16, 2013
streaming	streaming	Exclude old versions of Netty from Maven-based build	May 19, 2013
.gitignore	.gitignore	add test for JdbcRDD using embedded derby, per rxin suggestion	May 15, 2013
LICENSE	LICENSE	Added BSD license	Dec 7, 2010
README.md	README.md	Add comment to README that 2.10 not yet supported	Mar 26, 2013
kmeans_data.txt	kmeans_data.txt	Fixed bugs	Jan 9, 2012
lr_data.txt	lr_data.txt	Test commit	Feb 6, 2012
pom.xml	pom.xml	Update ASM to version 4.0	Jun 19, 2013
pyspark	pyspark	Adding IPYTHON environment variable support for launching pyspark usi…	Feb 7, 2013
run	run	Only check for repl classes if the user is running the repl. Otherwise,	May 16, 2013
run.cmd	run.cmd	Add spark-shell.cmd	Sep 25, 2012
run2.cmd	run2.cmd	1) Add support for HADOOP_CONF_DIR (and/or YARN_CONF_DIR - use either…	May 11, 2013
spark-executor	spark-executor	Further refactoring, and start of a standalone scheduler backend	Jul 7, 2012
spark-shell	spark-shell	More work to allow Spark to run on the standalone deploy cluster.	Jul 8, 2012
spark-shell.cmd	spark-shell.cmd	Add spark-shell.cmd	Sep 25, 2012

Name

Last commit message

Last commit date

mateiz

Merge pull request #563 from jey/python-optimization

Jun 22, 2013

7e4b266 · Jun 22, 2013

2,904 Commits

bagel

Attempt to fix streaming test failures after yarn branch merge

Apr 28, 2013

bin

Revert "Fix start-slave not passing instance number to spark-daemon."

Jun 11, 2013

conf

added SPARK_WORKER_INSTANCES : allows spawning multiple worker instan…

Mar 27, 2013

core

use parens when calling method with side-effects

Jun 21, 2013

docs

Fixed a couple typos and formating problems in the YARN documentation.

May 18, 2013

ec2

Fix SPARK-670: EC2 start command should require -i option.

May 5, 2013

examples

Merge remote-tracking branch 'milliondreams/casdemo'

Jun 18, 2013

project

Increase memory for tests to prevent a crash on JDK 7

Jun 22, 2013

python

Fix reporting of PySpark exceptions

Jun 21, 2013

repl-bin

Attempt at fixing merge conflict

Apr 24, 2013

repl

Update ASM to version 4.0

Jun 19, 2013

sbt

Increase ReservedCodeCacheSize for sbt

Apr 16, 2013

streaming

Exclude old versions of Netty from Maven-based build

May 19, 2013

.gitignore

add test for JdbcRDD using embedded derby, per rxin suggestion

May 15, 2013

LICENSE

Added BSD license

Dec 7, 2010

README.md

Add comment to README that 2.10 not yet supported

Mar 26, 2013

Jan 9, 2012

Feb 6, 2012

Update ASM to version 4.0

Jun 19, 2013

pyspark

Adding IPYTHON environment variable support for launching pyspark usi…

Feb 7, 2013

run

Only check for repl classes if the user is running the repl. Otherwise,

May 16, 2013

run.cmd

Add spark-shell.cmd

Sep 25, 2012

run2.cmd

1) Add support for HADOOP_CONF_DIR (and/or YARN_CONF_DIR - use either…

May 11, 2013

spark-executor

Further refactoring, and start of a standalone scheduler backend

Jul 7, 2012

spark-shell

More work to allow Spark to run on the standalone deploy cluster.

Jul 8, 2012

spark-shell.cmd

Add spark-shell.cmd

Sep 25, 2012

Spark

Lightning-Fast Cluster Computing - http://www.spark-project.org/

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project webpage at http://spark-project.org/documentation.html. This README file only contains basic setup instructions.

Building

Spark requires Scala 2.9.2 (Scala 2.10 is not yet supported). The project is built using Simple Build Tool (SBT), which is packaged with it. To build Spark and its example programs, run:

sbt/sbt package

Spark also supports building using Maven. If you would like to build using Maven, see the instructions for building Spark with Maven in the spark documentation..

To run Spark, you will need to have Scala's bin directory in your PATH, or you will need to set the SCALA_HOME environment variable to point to where you've installed Scala. Scala must be accessible through one of these methods on your cluster's worker nodes as well as its master.

To run one of the examples, use ./run <class> <params>. For example:

./run spark.examples.SparkLR local[2]

will run the Logistic Regression example locally on 2 CPUs.

Each of the example programs prints usage help if no params are given.

All of the Spark samples take a <host> parameter that is the cluster URL to connect to. This can be a mesos:// or spark:// URL, or "local" to run locally with one thread, or "local[N]" to run locally with N threads.

A Note About Hadoop Versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS API has changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs. You can change the version by setting the HADOOP_VERSION variable at the top of project/SparkBuild.scala, then rebuilding Spark.

Configuration

Please refer to the "Configuration" guide in the online documentation for a full overview on how to configure Spark. At the minimum, you will need to create a conf/spark-env.sh script (copy conf/spark-env.sh.template) and set the following two variables:

SCALA_HOME: Location where Scala is installed.
MESOS_NATIVE_LIBRARY: Your Mesos library (only needed if you want to run on Mesos). For example, this might be /usr/local/lib/libmesos.so on Linux.

Contributing to Spark

Contributions via GitHub pull requests are gladly accepted from their original author. Along with any pull requests, please state that the contribution is your original work and that you license the work to the project under the project's open source license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project's open source license and warrant that you have the legal authority to do so.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark

Online Documentation

Building

A Note About Hadoop Versions

Configuration

Contributing to Spark

About

Releases

Packages

License

baeeq/incubator-spark

Folders and files

Latest commit

History

Repository files navigation

Spark

Online Documentation

Building

A Note About Hadoop Versions

Configuration

Contributing to Spark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages