Skip to content

Commit

Permalink
blueprints
Browse files Browse the repository at this point in the history
  • Loading branch information
ramadu committed Dec 15, 2022
1 parent 6b913a7 commit 7fbdf5b
Show file tree
Hide file tree
Showing 86 changed files with 4,346 additions and 8 deletions.
63 changes: 55 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,65 @@
package-lock.json
__pycache__
.pytest_cache
.venv
venv/
*.egg-info

# OS
.DS_Store

# IDE
.idea/
.vscode/
.terraform
.terraform.**
terraform.*
terraform.output
.DS_Store
.venv/
tmp/
**/.terraform/*
**/.terraform*
# .tfstate files
*.tfstate
*.tfstate.*
terraform.out
# Crash log files
crash.log
crash.*.log
# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc
# Dependency directories
node_modules/
jspm_packages/
# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache
**/cdk.out
cdk.out
**/cdk.context.json
package-lock.json
# Optional REPL history
.node_repl_history
# Output of 'npm pack'
*.tgz
# General
.DS_Store
.AppleDouble
.LSOverride
__pycache__
22 changes: 22 additions & 0 deletions blueprints/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.PHONY: debug cdk-install-requirements cdk-setup-vpc cdk-deploy-infra cdk-deploy-to-bucket cdk-list-stacks cdk-setup-mwaa-env cdk-diff cdk-setup-eks-role

install-cdk-requirements: ## install the python dependencies needed to run cdk IaC commands
@pip install -r infra/cdk/requirements.txt

cdk-list: ## list all the stacks. due to SDK dependencies, this fails if run prior to S3 bucket creation
@$(MAKE) -C infra/cdk list

cdk-diff: ## list the local changes in cdk compared to the previously installed infrastructure
@$(MAKE) -C infra/cdk diff

cdk-deploy-infra:
@S3_FLAG=False $(MAKE) -C infra/cdk infra

cdk-deploy-to-bucket: ## setup VPC needed for the mwaa infrastructure using CDK
@$(MAKE) -C infra/cdk s3-deploy

cdk-setup-eks-role: ## setup the infrastructure dependencies for EKS cluster (eg: IAM Role)
@$(MAKE) -C infra/cdk eks-role

help:
@grep -E '^[a-zA-Z_-]+:.*?#.*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'
97 changes: 97 additions & 0 deletions blueprints/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# mwaa-blueprints

## Description

This is a collection of getting started blueprints for using Amazon Managed Workflows for Apache Airflow (MWAA). Below
is the high level structure and the key files

```sh
├── examples
│ ├── AWSGlue
│ │ ├── README.md
│ │ ├── dags
│ │ ├── infra
│ │ └── scripts
│ ├── EKS
│ │ ├── dags
│ │ ├── requirements.txt
│ │ └── infra
│ ├── EMR
│ │ ├── dags
│ │ └── spark
│ ├── EMR_on_EKS
│ │ ├── infra
│ │ ├── dags
│ │ ├── spark
│ ├── Lambda
│ │ ├── dags
│ │ └── image
└── infra
├── cdk
├── cloudformation
└── terraform
```

### Folder Structure Details

- **README.md:** This file with instructions on how to use the blueprints

- **Makefile:** A collection of make targets to run the various commands to setup infrastructure. To get detailed
infromation about the make targets, run ```make help``` from the root folder

- **examples:** This folder has a collection of technology specific DAGs organized into specific subfolders. Review the
subfolders for details

- **infra:** This folder has the infrastructure setup needed t o run the examples. Infrastructures are based
on ```cloudformation```, ```cdk``` and ```terraform```.

## Badges

## Installation

### CDK
This example cretes MWAA environment and has the DAGs to create an EKS cluster.
Setup Environment and execute examples [cdk](examples/EKS/README.md)

### Terraform

Access [terraform](infra/terraform/README.md)

#### Examples

Access [Examples](examples/)

## Support

Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address,
etc.

## Roadmap

If you have ideas for releases in the future, it is a good idea to list them in the README.

## Contributing

State if you are open to contributions and what your requirements are for accepting them.

For people who want to make changes to your project, it's helpful to have some documentation on how to get started.
Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps
explicit. These instructions could also be useful to your future self.

You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce
the likelihood that the changes inadvertently break something. Having instructions for running tests is especially
helpful if it requires external setup, such as starting a Selenium server for testing in a browser.

## Authors and acknowledgment

Show your appreciation to those who have contributed to the project.

## License

For open source projects, say how it is licensed.

## Project status

If you have run out of energy or time for your project, put a note at the top of the README saying that development has
slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or
owner, allowing your project to keep going. You can also make an explicit request for maintainers.
20 changes: 20 additions & 0 deletions blueprints/examples/AWSGlue/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@

.PHONY: all
deploy: ## terraform
terraform -chdir="./infra/terraform" init
terraform -chdir="./infra/terraform" plan
terraform -chdir="./infra/terraform" apply
$(MAKE) post-provision

post-provision:
chmod 700 ./post_provision.sh
./post_provision.sh $(mwaa_bucket) $(mwaa_execution_role_name) $(mwaa_env_name)

undeploy:
chmod 700 ./pre_termination.sh
./pre_termination.sh $(mwaa_bucket) $(mwaa_execution_role_name)
terraform -chdir="./infra/terraform" destroy




94 changes: 94 additions & 0 deletions blueprints/examples/AWSGlue/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Glue with MWAA

This example is a quick start for orchestrating AWS Glue crawler and AWS Glue Job with MWAA
The example uses [NOAA Climatology data](https://docs.opendata.aws/noaa-ghcn-pds/readme.html)

## Prerequisites:

Ensure that you have installed the following tools on your machine.

1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
3. [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
4. [Amazon MWAA](https://aws.amazon.com/managed-workflows-for-apache-airflow/)


_Note: If you do not have running MWAA environment, deploy it from the root of the project using terraform or CDK.

## Deploy EKS Clusters with EMR on EKS feature

Clone the repository

```sh
git clone https://github.com/aws-samples/amazon-mwaa-examples.git

```

Navigate into one of the example directories and run `make` by passing MWAA environment related arguments

```sh
cd blueprints/examples/AWSGlue
make deploy mwaa_bucket={MWAA_BUCKET} mwaa_execution_role_name={MWAA_EXEC_ROLE} mwaa_env_name={MWAA_ENV_NAME}
```

## Login to MWAA

Login to your Amazon MWAA environment. You should see a dag by the name 'emr_eks_weatherstation_job'

Unpause the DAG and Run it from console

## What does the makefile do?
1. Create the infrastructure
- IAM service role for AWS Glue, IAM policy with AWS Glue permissions that will be attached MWAA execution role
- S3 buckets for Spark scripts and data
2. Attaches Glue(IAM policy) access permissions to MWAA execution role
3. Copy DAGs and Scripts to S3 buckets
4. Update MWAA environment with Variables neeeded for DAGs.

## What's needed for MWAA to access Glue cluster

- Needs permissions to create/run AWS Glue crawler and Job.

```json
{
"Statement": [
{
"Action": [
"glue:CreateJob",
"glue:ListCrawlers",
"glue:ListJobs",
"glue:CreateCrawler"
"glue:GetCrawlerMetrics"
"glue:GetCrawler",
"glue:StartCrawler",
"glue:UpdateCrawler"
"glue:StartJobRun",
"glue:GetJobRun",
"glue:UpdateJob",
"glue:GetJob"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "Glue"
},
{
"Action": [
"iam:PassRole",
"iam:GetRole"
],
"Effect": "Allow",
"Resource": [
"arn:aws:iam::{account}:role/{glue_service_role}",
],
"Sid": "Gluepassrole"
}
],
"Version": "2012-10-17"
}
```

## Clean up
```sh
cd blueprints/examples/AWSGlue
make deploy mwaa_bucket={MWAA_BUCKET} mwaa_execution_role_name={MWAA_EXEC_ROLE} mwaa_env_name={MWAA_ENV_NAME}
```
- Login to AWS account and delete AWS Glue tables starting with `year_`, AWS Glue Crawler named `noaa-weather-station-data` and AWS Glue Job `noaa_weatherdata_transform`
Loading

0 comments on commit 7fbdf5b

Please sign in to comment.