Originally developed as a fork of Qubole SparkLens, this tool analyzes Spark Event Logs to provide insights and cost optimization recommendations across various deployment options for Amazon EMR.
It generates an HTML report, which can be saved locally or uploaded to an Amazon S3 bucket for easy access and quick review.
- 2025/01 Release (v0.3.0): New Report UI and bug fix
- 2024/08 Release (v0.2.0): Spark Event Log Analysis with EMR deployment recommendations
- sbt (Java 17)
- Apache Spark
- AWS Account
Note To process Spark Event Logs stored in an S3 bucket, ensure that the hadoop-aws
libraries are included in the Spark jar path when running the tool on your local machine.
Note The application uses AWS services to retrieve additional information (e.g. pricing) to generate the recommendations. Please make sure you have the required IAM Permissions when running the application.
Compile and build the package locally
sbt clean compile assembly
To build the package on an EMR on EC2 (release >= 7.6.0) cluster:
# requirements
export JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto.x86_64/
export PATH=$JAVA_HOME/bin:$PATH
sudo wget https://www.scala-sbt.org/sbt-rpm.repo -O /etc/yum.repos.d/sbt-rpm.repo
sudo yum -y install git sbt
# clone repo and build
git clone https://github.com/aws-samples/aws-emr-advisor && cd aws-emr-advisor
sbt clean compile assembly
Run the application on an EMR on EC2 or Spark cluster using the spark-submit
command. For specific examples, refer to the corresponding documentation pages.
Here are some example reports generated when using the tool.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.