blueprints

aws-samples · Dec 15, 2022 · 7fbdf5b · 7fbdf5b
1 parent 6b913a7
commit 7fbdf5b
Show file tree

Hide file tree

Showing 86 changed files with 4,346 additions and 8 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,18 +2,65 @@
 package-lock.json
 __pycache__
 .pytest_cache
-.venv
 venv/
 *.egg-info
 
-# OS
-.DS_Store
 
 # IDE
 .idea/
 .vscode/
-.terraform
-.terraform.**
-terraform.*
-terraform.output
-.DS_Store
+.venv/
+tmp/
+**/.terraform/*
+**/.terraform*
+# .tfstate files
+*.tfstate
+*.tfstate.*
+terraform.out
+# Crash log files
+crash.log
+crash.*.log
+# Exclude all .tfvars files, which are likely to contain sensitive data, such as
+# password, private keys, and other secrets. These should not be part of version 
+# control as they are data points which are potentially sensitive and subject 
+# to change depending on the environment.
+*.tfvars
+*.tfvars.json
+
+# Ignore override files as they are usually used to override resources locally and so
+# are not checked in
+override.tf
+override.tf.json
+*_override.tf
+*_override.tf.json
+
+# Include override files you do wish to add to version control using negated pattern
+# !example_override.tf
+
+# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
+# example: *tfplan*
+
+# Ignore CLI configuration files
+.terraformrc
+terraform.rc
+# Dependency directories
+node_modules/
+jspm_packages/
+# Optional npm cache directory
+.npm
+
+# Optional eslint cache
+.eslintcache
+**/cdk.out
+cdk.out
+**/cdk.context.json
+package-lock.json
+# Optional REPL history
+.node_repl_history
+# Output of 'npm pack'
+*.tgz
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+__pycache__
diff --git a/blueprints/Makefile b/blueprints/Makefile
@@ -0,0 +1,22 @@
+.PHONY: debug cdk-install-requirements cdk-setup-vpc cdk-deploy-infra cdk-deploy-to-bucket cdk-list-stacks cdk-setup-mwaa-env cdk-diff cdk-setup-eks-role
+
+install-cdk-requirements: ## install the python dependencies needed to run cdk IaC commands
+	@pip install -r infra/cdk/requirements.txt
+
+cdk-list: ## list all the stacks. due to SDK dependencies, this fails if run prior to S3 bucket creation
+	@$(MAKE) -C infra/cdk list
+
+cdk-diff: ## list the local changes in cdk compared to the previously installed infrastructure
+	@$(MAKE) -C infra/cdk diff
+
+cdk-deploy-infra:
+	@S3_FLAG=False $(MAKE) -C infra/cdk infra
+
+cdk-deploy-to-bucket: ## setup VPC needed for the mwaa infrastructure using CDK
+	@$(MAKE) -C infra/cdk s3-deploy
+
+cdk-setup-eks-role: ## setup the infrastructure dependencies for EKS cluster (eg: IAM Role)
+	@$(MAKE) -C infra/cdk eks-role
+
+help:
+	@grep -E '^[a-zA-Z_-]+:.*?#.*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'
diff --git a/blueprints/README.md b/blueprints/README.md
@@ -0,0 +1,97 @@
+# mwaa-blueprints
+
+## Description
+
+This is a collection of getting started blueprints for using Amazon Managed Workflows for Apache Airflow (MWAA). Below
+is the high level structure and the key files
+
+```sh
+├── examples
+│   ├── AWSGlue
+│   │   ├── README.md
+│   │   ├── dags
+│   │   ├── infra
+│   │   └── scripts
+│   ├── EKS
+│   │   ├── dags
+│   │   ├── requirements.txt
+│   │   └── infra
+│   ├── EMR
+│   │   ├── dags
+│   │   └── spark
+│   ├── EMR_on_EKS
+│   │   ├── infra
+│   │   ├── dags
+│   │   ├── spark
+│   ├── Lambda
+│   │   ├── dags
+│   │   └── image
+└── infra
+    ├── cdk
+    ├── cloudformation
+    └── terraform
+```
+
+### Folder Structure Details
+
+- **README.md:** This file with instructions on how to use the blueprints
+
+- **Makefile:** A collection of make targets to run the various commands to setup infrastructure. To get detailed
+  infromation about the make targets, run ```make help``` from the root folder
+
+- **examples:** This folder has a collection of technology specific DAGs organized into specific subfolders. Review the
+  subfolders for details
+
+- **infra:** This folder has the infrastructure setup needed t o run the examples. Infrastructures are based
+  on ```cloudformation```, ```cdk``` and ```terraform```.
+
+## Badges
+
+## Installation
+
+### CDK
+This example cretes MWAA environment and has the DAGs to create an EKS cluster. 
+Setup Environment and execute examples [cdk](examples/EKS/README.md)
+
+### Terraform
+
+Access [terraform](infra/terraform/README.md)
+
+#### Examples
+
+Access [Examples](examples/)
+
+## Support
+
+Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address,
+etc.
+
+## Roadmap
+
+If you have ideas for releases in the future, it is a good idea to list them in the README.
+
+## Contributing
+
+State if you are open to contributions and what your requirements are for accepting them.
+
+For people who want to make changes to your project, it's helpful to have some documentation on how to get started.
+Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps
+explicit. These instructions could also be useful to your future self.
+
+You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce
+the likelihood that the changes inadvertently break something. Having instructions for running tests is especially
+helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
+
+## Authors and acknowledgment
+
+Show your appreciation to those who have contributed to the project.
+
+## License
+
+For open source projects, say how it is licensed.
+
+## Project status
+
+If you have run out of energy or time for your project, put a note at the top of the README saying that development has
+slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or
+owner, allowing your project to keep going. You can also make an explicit request for maintainers.
diff --git a/blueprints/examples/AWSGlue/Makefile b/blueprints/examples/AWSGlue/Makefile
@@ -0,0 +1,20 @@
+
+.PHONY: all
+deploy: ## terraform
+	terraform -chdir="./infra/terraform" init 
+	terraform -chdir="./infra/terraform" plan 
+	terraform -chdir="./infra/terraform" apply 
+	$(MAKE) post-provision
+
+post-provision:
+	chmod 700 ./post_provision.sh
+	./post_provision.sh $(mwaa_bucket) $(mwaa_execution_role_name) $(mwaa_env_name)  
+
+undeploy:
+	chmod 700 ./pre_termination.sh
+	./pre_termination.sh  $(mwaa_bucket) $(mwaa_execution_role_name)
+	terraform -chdir="./infra/terraform" destroy
+
+
+
+
diff --git a/blueprints/examples/AWSGlue/README.md b/blueprints/examples/AWSGlue/README.md
@@ -0,0 +1,94 @@
+# Glue with MWAA
+
+This example is a quick start for orchestrating AWS Glue crawler and AWS Glue Job with MWAA
+The example uses [NOAA Climatology data](https://docs.opendata.aws/noaa-ghcn-pds/readme.html)
+
+## Prerequisites:
+
+Ensure that you have installed the following tools on your machine.
+
+1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
+3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
+4. [Amazon MWAA](https://aws.amazon.com/managed-workflows-for-apache-airflow/)
+
+
+_Note: If you do not have running MWAA environment, deploy it from the root of the project using terraform or CDK.
+
+## Deploy EKS Clusters with EMR on EKS feature
+
+Clone the repository
+
+```sh
+git clone https://github.com/aws-samples/amazon-mwaa-examples.git
+
+```
+
+Navigate into one of the example directories and run `make` by passing MWAA environment related arguments
+
+```sh
+cd blueprints/examples/AWSGlue
+make deploy mwaa_bucket={MWAA_BUCKET} mwaa_execution_role_name={MWAA_EXEC_ROLE} mwaa_env_name={MWAA_ENV_NAME}
+```
+
+## Login to MWAA
+
+Login to your Amazon MWAA environment. You should see a dag by the name 'emr_eks_weatherstation_job'
+
+Unpause the DAG and Run it from console
+
+## What does the makefile do?
+1. Create the infrastructure
+    - IAM service role for AWS Glue, IAM policy with AWS Glue permissions that will be attached MWAA execution role
+    - S3 buckets for Spark scripts and data
+2. Attaches Glue(IAM policy) access permissions to MWAA execution role
+3. Copy DAGs and Scripts to S3 buckets
+4. Update MWAA environment with Variables neeeded for DAGs.
+
+## What's needed for MWAA to access Glue cluster
+
+- Needs permissions to create/run AWS Glue crawler and Job. 
+
+```json
+{
+    "Statement": [
+        {
+            "Action": [
+                "glue:CreateJob",
+                "glue:ListCrawlers",
+                "glue:ListJobs",
+                "glue:CreateCrawler"
+                "glue:GetCrawlerMetrics"
+                "glue:GetCrawler",
+                "glue:StartCrawler",
+                "glue:UpdateCrawler"
+                "glue:StartJobRun",
+                "glue:GetJobRun",
+                "glue:UpdateJob",
+                "glue:GetJob"
+           ],
+            "Effect": "Allow",
+            "Resource": "*",
+            "Sid": "Glue"
+        },
+        {
+            "Action": [
+            "iam:PassRole",
+            "iam:GetRole"
+            ],
+            "Effect": "Allow",
+            "Resource": [
+                "arn:aws:iam::{account}:role/{glue_service_role}",
+            ],
+            "Sid": "Gluepassrole"
+        }
+    ],
+    "Version": "2012-10-17"
+}
+```
+
+## Clean up
+```sh
+cd blueprints/examples/AWSGlue
+make deploy mwaa_bucket={MWAA_BUCKET} mwaa_execution_role_name={MWAA_EXEC_ROLE} mwaa_env_name={MWAA_ENV_NAME}
+```
+- Login to AWS account and delete AWS Glue tables starting with `year_`, AWS Glue Crawler named `noaa-weather-station-data` and AWS Glue Job `noaa_weatherdata_transform`