-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform sync tool #290
base: master
Are you sure you want to change the base?
Terraform sync tool #290
Changes from all commits
cf9653f
fbca216
313418f
c1ccc83
1a69c02
feb176f
7c42756
d94c55b
345ab63
3cfcbc2
8ae5b48
4d5ac73
7c8e1d1
8b690c3
c2fc146
dd75e1f
2a354f0
376b7fd
7a33b29
9a5fe09
5c36be7
d330bb9
252dcab
abcf98f
2422a88
3fca695
485bde1
52467af
9211697
5dabf46
e85bd21
073e192
f699fdd
c89fee7
1e65ae7
ec2d139
90cb7da
23f1701
993bffa
d31e25f
1e7a6fc
67505be
69a2df3
9cd0da7
1c4fc9f
d2e5e0e
815bee4
2bf25ea
4e1bb73
2b20ed5
7f469e4
5496102
34f1461
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# Terraform Sync Tool | ||
|
||
This directory contains the Terraform Sync Tool. This tool intentionally fails your CI/CD pipeline when schema drifts occur between | ||
what your BigQuery Terraform resources declare and what's actually present in your BigQuery environment. | ||
Theses schema drifts happen when BigQuery tables are updated by processes outside of Terraform (ETL process may dynamically add new columns when loading data into BigQuery). | ||
When drifts occur, you end up with outdated BigQuery Terraform resource files. This tool detects the schema drifts, | ||
traces the origins of the drifts, and alerts developers/data engineers (by failing the CI/CD pipeline) | ||
so they can patch the Terraform in their current commit. | ||
|
||
|
||
Terraform Sync Tool can be integrated into your CI/CD pipeline. You'll need to add two steps to CI/CD pipeline. | ||
- Step 0: Run the Terraform plan command (using either Terraform/Terragrunt) with the `-json` option and write the output into a JSON file using the caret operator `> output.json` | ||
- Step 1: Use Python scripts to identify and investigate the drifts | ||
|
||
## How to run Terraform Schema Sync Tool | ||
|
||
```bash | ||
############### | ||
# Using Terragrunt | ||
############### | ||
terragrunt run-all plan -json --terragrunt-non-interactive > plan_output.json | ||
python3 terraform_sync.py plan_output.json <YOUR_GCP_PROJECT_ID> | ||
############## | ||
# Using Terraform | ||
############## | ||
terraform plan -json > plan_output.json | ||
python3 terraform_sync.py plan_output.json <YOUR_GCP_PROJECT_ID> | ||
``` | ||
|
||
## How Terraform Schema Sync Tool Works | ||
|
||
![Architecture Diagram](architecture.png) | ||
|
||
**Executing the Sync Tool** | ||
|
||
The Terraform Sync Tool will be executed as part of the CI/CD pipeline build steps triggered anytime when developers make a change to the linked repository. A build step specifies an action that you want Cloud Build to perform. For each build step, Cloud Build executes a docker container as an instance of docker run. | ||
|
||
**Step 0: Terraform Detects Drifts** | ||
|
||
`deploy.sh` contains terraform plan command that writes event outputs into `plan_out.json` file. We'll use `plan_out.json` file for further investigation in the future steps. Feel free to repace `plan_out.json` with your JSON filename. we can pass through the variables ${env} and ${tool} if any. | ||
|
||
**Step 1: Investigate Drifts** | ||
|
||
`requirements.txt` specifies python dependencies, and `terraform_sync.py` contains python scripts to | ||
investigate terraform event outputs stored from step 0 to detect and address schema drifts | ||
|
||
In the python scripts(`terraform_sync.py`), we firstly scan through the output by line to identify all the drifted tables and store their table names. | ||
After storing the drifted table names and converted them into the table_id format:[gcp_project_id].[dataset_id].[table_id], we make API calls, to fetch the latest table schemas from BigQuery. | ||
|
||
**Step 2: Fail Builds and Notify Expected Schemas** | ||
|
||
Once the schema drifts are detected and identified, we fail the build and notify the developer who makes changes to the repository. The notifications will include the details and the expected schemas in order keep the schema files up-to-date with the latest table schemas in BigQuery. | ||
|
||
To interpret the message, the expected table schema is in the format of [{table1_id:table1_schema}, {table2_id: table2_schema}, ...... ]. table_id falls in the format of [gcp_project_id].[dataset_id].[table_id] | ||
|
||
**What is Terragrunt?** | ||
|
||
Terragrunt(https://terragrunt.gruntwork.io/docs/getting-started/install) is a framework on top of Terraform with some new tools out-of-the-box. | ||
Using new files *.hcl and new keywords, you can share variables across terraform modules easily. | ||
|
||
## How to run this sample repo? | ||
|
||
#### Fork and Clone this repo | ||
|
||
#### Folder Structure | ||
This directory serves as a starting point for your cloud project with terraform-sync-tool as one of qa tools integrated. | ||
|
||
. | ||
├── modules # Terraform modules directory | ||
│ ├── bigquery # Example Terraform BigQuery Setup | ||
│ └── ... # Other modules setup you have | ||
├── qa # qa environment directory | ||
│ ├── terragrunt.hcl | ||
│ └── terraform-sync-tool # Tool terraform-sync-tool | ||
│ ├── json_schemas # Terraform schema files | ||
│ ├── terragrunt.hcl | ||
│ └── ... | ||
├── cloudbuild.yaml # Cloud Build configuration file | ||
├── deploy.sh # Build Step 0 - contains terragrunt commands | ||
Comment on lines
+69
to
+79
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of these should go inside an example/ directory There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated in 5496102 |
||
├── requirements.txt # Build Step 1 - Specifies python dependencies | ||
├── terraform_sync.py # Build Step 1 - python scripts | ||
└── ... # etc. | ||
|
||
#### Go to the directory you just cloned, and update | ||
|
||
- **YOUR_GCP_PROJECT_ID** in `./qa/terragrunt.hcl` | ||
- **YOUR_BUCKET_NAME** in `./qa/terragrunt.hcl` | ||
- **YOUR_DATASET_ID** in `./qa/terraform-sync-tool/terragrunt.hcl` | ||
|
||
### Use Terraform/Terragrunt commands to test if any resources drifts existed | ||
|
||
Terragrunt/Terraform commands: | ||
``` | ||
terragrunt run-all plan -json --terragrunt-non-interactive | ||
|
||
# Terraform Command | ||
terraform plan -json | ||
``` | ||
|
||
After running the Terrform plan command, **the event type "resource_drift"("type": "resource_drift") indicates a drift has occurred**. | ||
If drifts detected, please update your terraform configurations and address the resource drifts based on the event outputs. | ||
|
||
|
||
#### Add Could Build Steps to your configuration file | ||
|
||
Please check cloud build steps in `cloudbuild.yaml` file, and add these steps to your Cloud Build Configuration File. | ||
|
||
- step 0: run terraform commands in `deploy.sh` to detects drifts | ||
|
||
Add `deploy.sh` to your project directory. | ||
|
||
- step 1: run python scripts to investigate terraform output | ||
|
||
Add `requirements.txt` and `terraform_sync.py` to your project directory. | ||
|
||
#### (Optional if you haven't created a Cloud Build Trigger) Create and configure a new Trigger in Cloud Build | ||
Make sure to indicate your cloud configuration file location correctly. In this sample repo, use `tools/terraform_sync_tool/cloudbuild.yaml` as your cloud configuration file location | ||
|
||
#### That's all you need! Let's commit and test in Cloud Build! |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
steps: | ||
# step 0: run terraform commands in deploy.sh to detects drifts | ||
- name: 'alpine/terragrunt' | ||
entrypoint: 'bash' | ||
dir: './tools/terraform_sync_tool/' | ||
args: ['deploy.sh', 'qa', 'terraform-sync-tool'] | ||
|
||
# step 1: run python scripts to investigate terraform output | ||
- name: python:3.7 | ||
entrypoint: 'bash' | ||
dir: './tools/terraform_sync_tool/' | ||
args: | ||
- -c | ||
- 'pip install -r ./requirements.txt' | ||
- 'python terraform_sync.py plan_out.json <GCP_PROJECT_ID>' |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
#!/bin/bash | ||
|
||
env=$1 | ||
tool=$2 | ||
|
||
terragrunt run-all plan -json --terragrunt-non-interactive --terragrunt-working-dir="${env}"/"${tool}" > plan_out.json |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
locals { | ||
datasets = { for dataset in var.datasets : dataset["dataset_id"] => dataset } | ||
tables = { for table in var.tables : table["table_id"] => table } | ||
|
||
iam_to_primitive = { | ||
"roles/bigquery.dataOwner" : "OWNER" | ||
"roles/bigquery.dataEditor" : "WRITER" | ||
"roles/bigquery.dataViewer" : "READER" | ||
} | ||
} | ||
|
||
#this is the test for dataset list creation | ||
resource "google_bigquery_dataset" "bq_dataset" { | ||
for_each = local.datasets | ||
friendly_name = each.value["friendly_name"] | ||
dataset_id = each.key | ||
location = each.value["location"] | ||
project = var.project_id | ||
} | ||
|
||
resource "google_bigquery_table" "bq_table" { | ||
for_each = local.tables | ||
dataset_id = each.value["dataset_id"] | ||
friendly_name = each.key | ||
table_id = each.key | ||
labels = each.value["labels"] | ||
schema = file(each.value["schema"]) | ||
clustering = each.value["clustering"] | ||
expiration_time = each.value["expiration_time"] | ||
project = var.project_id | ||
deletion_protection = each.value["deletion_protection"] | ||
depends_on = [google_bigquery_dataset.bq_dataset] | ||
|
||
dynamic "time_partitioning" { | ||
for_each = each.value["time_partitioning"] != null ? [each.value["time_partitioning"]] : [] | ||
content { | ||
type = time_partitioning.value["type"] | ||
expiration_ms = time_partitioning.value["expiration_ms"] | ||
field = time_partitioning.value["field"] | ||
require_partition_filter = time_partitioning.value["require_partition_filter"] | ||
} | ||
} | ||
|
||
dynamic "range_partitioning" { | ||
for_each = each.value["range_partitioning"] != null ? [each.value["range_partitioning"]] : [] | ||
content { | ||
field = range_partitioning.value["field"] | ||
range { | ||
start = range_partitioning.value["range"].start | ||
end = range_partitioning.value["range"].end | ||
interval = range_partitioning.value["range"].interval | ||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
variable "location" { | ||
description = "The regional location for the dataset only US and EU are allowed in module" | ||
type = string | ||
default = "US" | ||
} | ||
|
||
variable "deletion_protection" { | ||
description = "Whether or not to allow Terraform to destroy the instance. Unless this field is set to false in Terraform state, a terraform destroy or terraform apply that would delete the instance will fail." | ||
type = bool | ||
default = true | ||
} | ||
|
||
variable "project_id" { | ||
description = "Project where the dataset and table are created" | ||
type = string | ||
} | ||
|
||
variable "datasets" { | ||
description = "this is a test DS" | ||
default = [] | ||
type = list(object({ | ||
dataset_id = string | ||
friendly_name = string | ||
location = string | ||
} | ||
)) | ||
} | ||
|
||
variable "tables" { | ||
description = "A list of objects which include table_id, schema, clustering, time_partitioning, expiration_time and labels." | ||
default = [] | ||
type = list(object({ | ||
table_id = string, | ||
dataset_id = string, #added to test creating multi dataset | ||
schema = string, | ||
clustering = list(string), | ||
deletion_protection=bool, | ||
time_partitioning = object({ | ||
expiration_ms = string, | ||
field = string, | ||
type = string, | ||
require_partition_filter = bool, | ||
}), | ||
range_partitioning = object({ | ||
field = string, | ||
range = object({ | ||
start = string, | ||
end = string, | ||
interval = string, | ||
}), | ||
}), | ||
expiration_time = string, | ||
labels = map(string), | ||
} | ||
)) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
[ | ||
{ | ||
"description": "Col1", | ||
"mode": "NULLABLE", | ||
"name": "Col1", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col2", | ||
"mode": "NULLABLE", | ||
"name": "Col2", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col3", | ||
"mode": "NULLABLE", | ||
"name": "Col3", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col4", | ||
"mode": "NULLABLE", | ||
"name": "Col4", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col5", | ||
"mode": "NULLABLE", | ||
"name": "Col5", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col6", | ||
"mode": "NULLABLE", | ||
"name": "Col6", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col7", | ||
"mode": "NULLABLE", | ||
"name": "Col7", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col8", | ||
"mode": "NULLABLE", | ||
"name": "Col8", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col9", | ||
"mode": "NULLABLE", | ||
"name": "Col9", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col10", | ||
"mode": "NULLABLE", | ||
"name": "Col10", | ||
"type": "STRING" | ||
} | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
[ | ||
{ | ||
"description": "Col1", | ||
"mode": "NULLABLE", | ||
"name": "Col1", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col2", | ||
"mode": "NULLABLE", | ||
"name": "Col2", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col3", | ||
"mode": "NULLABLE", | ||
"name": "Col3", | ||
"type": "STRING" | ||
}, | ||
{ | ||
"description": "Col4", | ||
"mode": "NULLABLE", | ||
"name": "Col4", | ||
"type": "STRING" | ||
} | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
terraform { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rename this file's parent folder from terraform-sync-tool to "bigquery" |
||
source = "../../modules/bigquery" | ||
} | ||
|
||
include "root" { | ||
path = find_in_parent_folders() | ||
expose = true | ||
} | ||
|
||
locals { | ||
# TODO: Update your dataset ID | ||
dataset_id = "DatasetForTest" #YOUR_DATASET_ID | ||
} | ||
|
||
inputs = { | ||
project_id = include.root.inputs.project_id | ||
# The ID of the project in which the resource belongs. If it is not provided, the provider project is used. | ||
datasets = [ | ||
{ | ||
dataset_id = "${local.dataset_id}" | ||
friendly_name = "Dataset for Terraform Sync Tool" | ||
location = "US" | ||
labels = {} | ||
} | ||
] | ||
|
||
tables = [ | ||
{ | ||
table_id = "TableForTest" | ||
dataset_id = "${local.dataset_id}" | ||
schema = "json_schemas/TableForTest.json" | ||
clustering = [] | ||
expiration_time = null | ||
deletion_protection = true | ||
range_partitioning = null | ||
time_partitioning = null | ||
labels = {} | ||
}, | ||
{ | ||
table_id = "TableForTest2" | ||
dataset_id = "${local.dataset_id}" | ||
schema = "json_schemas/TableForTest2.json" | ||
clustering = [] | ||
expiration_time = null | ||
deletion_protection = true | ||
range_partitioning = null | ||
time_partitioning = null | ||
labels = {} | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 5496102 and 34f1461