Skip to content

A generic interface for building custom supervised machine learning algorithms with Spark

License

Notifications You must be signed in to change notification settings

sethah/sparkopt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark-Opt

Spark-Opt provides flexible abstractions for building machine learning models/algorithms using Apache Spark ML. Specifically, the following are easily customizable:

  • prediction function
  • loss function
  • optimization routine

Being able to easily plug in the custom components above allows users to improve the scale of their algorithms, express a richer set of algorithms than what is currently available in Spark ML, and even improve upon existing Spark ML algorithms.

Build

Build SPARK-2.3.0-SNAPSHOT

This project requires building Spark 2.3+.

git clone https://github.com/apache/spark
cd spark
build/mvn clean install -DskipTests -Dmaven.javadoc.skip=True
cd [this repo]
mvn package

Run example

$SPARK_HOME/bin/spark-submit \
--class com.sethah.spark.sparkopt.examples.LogisticRegressionExample \
target/sparkopt-1.0-SNAPSHOT-jar-with-dependencies.jar \
--trainPath src/main/resources/binary \
--minimizer admm \
--l1Reg 0.05 \
--leReg 0.05

About

A generic interface for building custom supervised machine learning algorithms with Spark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages