Status (2026-05): bash deployment pipeline has been moved to
scripts/legacy/and entered maintenance-only. Terraform is the canonical path for new deployments. This page maps each legacy bash script to its Terraform replacement so existing users can plan a transition.
The terraform/ directory implements the same end state as the
old scripts/legacy/ bash pipeline.
Bash script (now in scripts/legacy/) |
Terraform | Notes |
|---|---|---|
0_setup_env.sh (kept at scripts/0_setup_env.sh, shared with ops tools) |
terraform.tfvars + provider AWS auto-detect |
Variables replace env-var sourcing |
1_enable_vpc_dns.sh |
modules/vpc-endpoints precondition |
TF asserts DNS is on; flip via AWS CLI once if it isn't |
2_validate_network_environment.sh |
data blocks at plan time |
TF fails plan if subnets/VPC don't exist |
3_create_vpc_endpoints.sh |
modules/vpc-endpoints |
full/minimal mode preserved |
4_install_eks_cluster.sh |
modules/eks-cluster |
private/public, KMS encryption, OIDC, vpc-cni/kube-proxy/pod-identity-agent |
5_check_environment.sh |
(n/a — operator concern) | TF doesn't verify operator's CLI tools |
6_create_system_nodegroup.sh |
modules/eks-system-nodegroup |
LVM userdata templated; arch auto-detected via aws_ec2_instance_type |
7_install_eks_addon.sh |
modules/eks-addons |
CoreDNS + Metrics Server (managed addons), CA + ALB (helm releases) |
option_install_csi_drivers.sh |
modules/eks-csi-drivers |
EBS always; EFS/FSx/S3 toggled via vars |
option_install_karpenter.sh |
modules/eks-karpenter |
helm + SQS interruption queue + EventBridge rules + sample NodePool/EC2NodeClass |
option_install_gpu_nodegroups.sh |
modules/eks-gpu-nodegroup |
Multi-NIC EFA via dynamic network_interfaces; OD/Spot/ODCR/CB declared per nodegroup |
option_install_gpu_stack.sh |
modules/eks-gpu-stack |
standard / operator mode dispatch |
option_create_bastion.sh (kept at scripts/) |
also: terraform/bootstrap-bastion/ |
Both maintained — pick one |
option_verify_gpu_efa.sh (kept at scripts/) |
(no TF equivalent) | NCCL benchmark; runs against an existing cluster |
option_show_nodegroup_topology.sh (kept at scripts/) |
(no TF equivalent) | Reads topology.k8s.aws/network-node-layer-N labels at runtime |
option_inspect_eks.sh (kept at scripts/) |
(no TF equivalent) | 9-check post-apply health audit |
topology_inventory_lib.sh (kept at scripts/) |
(kept) | Used by ops tools |
instance_arch_lib.sh |
replaced by data "aws_ec2_instance_type" |
Native TF lookup |
disk_detection_lib.sh |
terraform/modules/eks-{system,gpu}-nodegroup/templates/detect-ebs-disk.sh |
Embedded into userdata via templatefile() |
pod_identity_helpers.sh |
aws_eks_pod_identity_association resources |
Native TF resource |
.env |
terraform.tfvars |
|---|---|
CLUSTER_NAME |
cluster_name |
AWS_REGION |
aws_region |
VPC_ID |
vpc_id |
PRIVATE_SUBNET_A/B/C/D |
private_subnet_ids (list) |
PUBLIC_SUBNET_A/B/C/D |
public_subnet_ids (list) |
CLUSTER_MODE |
cluster_mode |
PUBLIC_ACCESS_CIDRS |
public_access_cidrs (list) |
K8S_VERSION |
k8s_version |
KMS_KEY_ARN |
kms_key_arn |
VPC_ENDPOINTS_MODE |
vpc_endpoints_mode |
SYSTEM_NODE_INSTANCE_TYPE |
system_node_instance_type |
SYSTEM_NODE_*_VOLUME_SIZE |
system_node_root_volume_size / ..._data_volume_size |
SYSTEM_NODE_DESIRED/MIN/MAX |
system_node_desired_capacity / _min_size / _max_size |
EC2_KEY_NAME |
ec2_key_name |
INSTALL_KARPENTER |
install_karpenter |
KARPENTER_VERSION |
karpenter_version |
INSTALL_EFS_CSI / INSTALL_FSX_CSI |
install_efs_csi / install_fsx_csi |
S3_BUCKET_ARNS |
s3_csi_bucket_arns |
GPU_INSTANCE_TYPES + DEPLOY_GPU_OD/SPOT/ODCR/CB |
gpu_nodegroups (list of objects) |
ODCR_IDS/AZS + CAPACITY_BLOCK_IDS/AZS |
per-entry capacity_reservation_id + subnet_ids |
GPU_PG_STRATEGY |
per-entry placement_group (none |
GPU_TARGET_AZ |
per-entry subnet_ids (single subnet) |
GPU_NG_SUFFIX |
per-entry suffix |
GPU_INSTALL_EFA_USERSPACE |
gpu_install_efa_userspace |
GPU_ENABLE_LOCAL_LVM etc. |
gpu_enable_local_lvm / gpu_local_lvm_* |
NVIDIA_DEVICE_PLUGIN_VERSION/REPO |
nvidia_device_plugin_version/repo |
EFA_DEVICE_PLUGIN_VERSION/IMAGE |
efa_device_plugin_version/image |
GPU_TOPOLOGY_MODE / _GATE / _GATE_LAYER |
(not ported — use option_verify_gpu_efa.sh) |
- No "already exists, skipping" loops. Terraform's state file tracks what was created. Re-applying a known-good config is a no-op.
- No retry-on-IAM-propagation. The AWS provider handles it.
- No imperative bounce of NVIDIA device plugin pods. The helm chart
uses
failOnInitError=true(upstream default) and self-heals via CrashLoopBackOff once the device files are ready. - Topology gate dropped. The bash
verify_topologystep inspected K8s labels post-NG and scaled the NG to 0 on mismatch. This is imperative and conflicts with TF's declarative model. Use the verifier scripts underscripts/option_verify_gpu_efa.shandscripts/option_show_nodegroup_topology.shinstead. - Apply must run from a host inside the cluster VPC when
cluster_mode=private. This matches the bash scripts' constraint; TF's kubernetes/helm providers useaws eks get-tokenexec auth which still needs API reachability. - No aws-auth ConfigMap manipulation. Cluster runs in
API_AND_CONFIG_MAPmode but TF usesaws_eks_access_entryresources (system NG, GPU NG, Karpenter node role) instead of patching the legacyaws-authConfigMap. This is the AWS-recommended path; the ConfigMap is left untouched. - Native deletion protection. TF sets
aws_eks_cluster.deletion_protectiondirectly (provider 5.70+). Bash had to callaws eks update-cluster-configafter the cluster was up. - Karpenter SQS interruption queue is enabled by default in TF. The
bash script intentionally skipped this (commented out
settings.interruptionQueue). The TF module creates the SQS queue plus 4 EventBridge rules (spot interruption, rebalance, instance state-change, AWS Health) so spot capacity is reclaimed gracefully. To match the bash behavior, set the helm valuesettings.interruptionQueue="". - Tagging. The bash scripts hard-code
business=middleware,resource=ekson every resource. TF uses the provider-leveldefault_tagsmap instead — set it once interraform.tfvarsand every taggable resource picks them up. Seeterraform.tfvars.example.
You can adopt Terraform incrementally:
- Keep using bash for
option_create_bastion.sh. - Use Terraform for the cluster/system NG/addons.
- Use bash for any one-off verification or operational tasks.
The two paths share manifests/ (IAM JSON, Karpenter sample
manifests). Don't run both for the same module — pick one source of
truth per resource group.