-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add spark benchmark test data generation changes (#694)
- Loading branch information
Showing
6 changed files
with
151 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7 changes: 7 additions & 0 deletions
7
website/docs/benchmarks/spark-operator-benchmark/_category_.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"label": "Spark Benchmarks", | ||
"position": 2, | ||
"link": { | ||
"type": "generated-index" | ||
} | ||
} |
70 changes: 70 additions & 0 deletions
70
website/docs/benchmarks/spark-operator-benchmark/data-generation.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
sidebar_position: 2 | ||
sidebar_label: Data Generation | ||
--- | ||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
import CollapsibleContent from '../../../src/components/CollapsibleContent'; | ||
|
||
# Data Generation for Running Spark Benchmark Tests on Amazon EKS | ||
|
||
The following guide provides instructions on how to generate the data set for running the TPCDS benchmark tests for Spark. | ||
|
||
<CollapsibleContent header={<h2><span>Deploying the Solution</span></h2>}> | ||
|
||
In this [example](https://github.com/awslabs/data-on-eks/tree/main/analytics/terraform/spark-k8s-operator), you will provision the following resources required to run Spark Jobs with open source Spark Operator and Apache YuniKorn. | ||
|
||
This example deploys an EKS Cluster running the Spark K8s Operator into a new VPC. | ||
|
||
- Creates a new sample VPC, 2 Private Subnets and 2 Public Subnets | ||
- Creates Internet gateway for Public Subnets and NAT Gateway for Private Subnets | ||
- Creates EKS Cluster Control plane with public endpoint (for demo reasons only) with core managed node group, on-demand node group and Spot node group for Spark workloads. | ||
- Deploys Metrics server, Cluster Autoscaler, Spark-k8s-operator, Apache Yunikorn, Karpenter, Grafana, AMP and Prometheus server. | ||
|
||
### Prerequisites | ||
|
||
Ensure that you have installed the following tools on your machine. | ||
|
||
1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) | ||
2. [kubectl](https://Kubernetes.io/docs/tasks/tools/) | ||
3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli) | ||
|
||
### Deploy | ||
|
||
Clone the repository. | ||
|
||
```bash | ||
git clone https://github.com/awslabs/data-on-eks.git | ||
cd data-on-eks | ||
export DOEKS_HOME=$(pwd) | ||
``` | ||
|
||
If DOEKS_HOME is ever unset, you can always set it manually using `export | ||
DATA_ON_EKS=$(pwd)` from your data-on-eks directory. | ||
|
||
Export the following environment variables to set the minimum and desired number of ssd enabled c5d 12xlarge instances. In our tests, we've set both of these to `6`. Please adjust the number of instances as per your requirement and set up. | ||
|
||
```bash | ||
export TF_VAR_spark_benchmark_ssd_min_size=6 | ||
export TF_VAR_spark_benchmark_ssd_desired_size=6 | ||
``` | ||
|
||
Navigate into one of the example directories and run `install.sh` script. | ||
|
||
```bash | ||
cd ${DOEKS_HOME}/analytics/terraform/spark-k8s-operator | ||
chmod +x install.sh | ||
./install.sh | ||
``` | ||
|
||
Now create an S3_BUCKET variable that holds the name of the bucket created | ||
during the install. This bucket will be used in later examples to store output | ||
data. If S3_BUCKET is ever unset, you can run the following commands again. | ||
|
||
```bash | ||
export S3_BUCKET=$(terraform output -raw s3_bucket_id_spark_history_server) | ||
echo $S3_BUCKET | ||
``` | ||
|
||
</CollapsibleContent> | ||
|
16 changes: 16 additions & 0 deletions
16
website/docs/benchmarks/spark-operator-benchmark/spark-operator-eks-benchmark.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
--- | ||
sidebar_position: 1 | ||
sidebar_label: Introduction to Spark Benchmarks | ||
--- | ||
|
||
# Introduction to Spark Benchmarks on Amazon EKS 🚀 | ||
|
||
This guide walks you through running Apache Spark benchmark tests on Amazon EKS, AWS's managed Kubernetes service. Benchmark tests help evaluate and optimize Spark workloads on EKS comparing benchmark results run across different EC2 instance families of Graviton instances, especially when scaling for performance, cost efficiency, and reliability. | ||
Key Features 📈 | ||
|
||
- Data Generation for the benchmark tests | ||
- Benchmark Test Execution on Different generation of Graviton Instances (r6g, r7g, r8g) | ||
- Benchmark Results | ||
- Customizable Benchmarks to suit your workloads | ||
- Autoscaling and Cost Optimization Strategies | ||
|