Canonical's Charmed Data Platform solution for Apache Spark runs Spark jobs on your Kubernetes cluster.
You can get started right away with MicroK8s - the mightiest tiny Kubernetes distro around!
You can install MicroK8s on your Ubuntu laptop, workstation, or nodes in your workgroup or server cluster with just one command - snap install microk8s --classic
. Learn more at microk8s.io.
The spark-client snap includes the scripts to run jobs via spark-submit, or using interactive shells, and other tools for managing Apache Spark jobs for Kubernetes.
The spark-client snap simplifies the setup to run Spark jobs against your Kubernetes cluster. To run the snap, make sure that your environment satisfies the requirements listed here and then follow the instructions in the setup section to get started.
Do check out the section on config resolution to understand how spark-submit actually resolves the configuration properties coming from a diverse set of available sources.
Once the setup is complete, create a Spark service account using the CLI and follow the Spark job submission guide to validate and start utilizing your Kubernetes cluster for big data workloads.
Don't forget to check out the interactive shells for Scala and Python. They can save you a lot of time and debugging effort for authoring Spark jobs in the Kubernetes environment.
Check out the How-Tos section for a list of useful commands that will make your life easy working with the Spark client. If Python is your thing, then you can also manage your service accounts via a Python library.
To work with structured Spark streaming application based on Kafka, please refer to the section on How to run Spark Streaming against Kafka
If you already have a Charmed Kubernetes setup, check out the sections for using spark-client with Charmed Kubernetes as snap and Charmed Kubernetes From Pod within the pod.
Further documentation can be found in Discourse
Spark on Kubernetes is a complex environment with many moving parts. Sometimes, small mistakes can take a lot of time to debug and figure out. Follow our list of common mistakes to avoid while setting up and playing with Spark on Kubernetes.
The spark-client snap is an initiative from Canonical to simplify and encourage the adoption of Apache Spark for Kubernetes environments. We are excited to share this initiative with the community. Although the project is in it's nascent stages, we are always on the lookout to collaborate with great engineers. If you think you have a great idea to delight the Spark community, follow and engage with us on Github and Discourse.