Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AZ mapping Code changes #421

Merged
merged 51 commits into from
Jan 31, 2024
Merged

AZ mapping Code changes #421

merged 51 commits into from
Jan 31, 2024

Conversation

sanjeevrg89
Copy link
Contributor

What does this PR do?

3 files have changed: main.tf, eks.tf and install-pre-requsites-for-ec2.sh

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

AZ mapping change was made as users were reporting issues related to provisioning trn1 instances.

Motivation

More

  • [ Yes] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • [ Yes] Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • [Yes ] Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • [ Yes] Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments on spelling errors otherwise its good to go

@@ -46,7 +46,7 @@ module "eks" {

# security group rule from all ipv4 to nodes for port 22
node_security_group_additional_rules = {
# Critical Security group rule for EFA enabled nodes
# Critical Secruity group rule for EFA enabled nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for spelling error and revert this change.

@@ -56,7 +56,7 @@ module "eks" {
self = true
}

# Critical Security group rule for EFA enabled nodes
# Critical Secruity group rule for EFA enabled nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for spelling error and revert this change.

@@ -248,7 +248,7 @@ module "eks" {
}
]

# Commented to investigate further as the node group creation is failing with placement group
# Commented to investigate further as the node group creation is failing with palcement group
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for spelling error and revert this change.

@@ -458,7 +457,7 @@ module "eks" {
},
]

# Commented to investigate further as the node group creation is failing with placement group
# Commented to investigate further as the node group creation is failing with palcement group
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for spelling error and revert this change.

@vara-bonthu vara-bonthu merged commit 006321d into awslabs:main Jan 31, 2024
47 of 52 checks passed
otterley

This comment was marked as outdated.

# The preferred AZ is the first AZ listed in the AZ id <-> region mapping in main.tf.
# We use index 2 to select the subnet in AZ1 with the 100.x CIDR:
# module.vpc.private_subnets = [AZ1_10.x, AZ2_10.x, AZ1_100.x, AZ2_100.x]
subnet_ids = [module.vpc.private_subnets[2]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is launching them in the same subnet a requirement? This requirement would make it actually more brittle as a generic blueprint, as instance availability is highly dynamic and varies across AZs across different regions. Ideally all available subnets would be supplied so that EC2 Auto Scaling and/or Karpenter can retry in different subnets on failure due to unavailabilty or lack of support.

Copy link
Collaborator

@vara-bonthu vara-bonthu Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These workloads are intended to operate within a single subnet, and our previous approach also filters to the first private subnet. I recognize that a key challenge here is the availability of Trn1 instances in specific regions and AZs.

@sanjeevrg89's solution addresses this by allowing selection of only those regions and AZs where Trn1 instances are available, as outlined in the az_mapping field section of the main.tf file. This mapping field can be expanded to include additional regions and AZs where Trn1 availability exists.

We can do better in write a documentation to explain this to users

lusoal pushed a commit to lusoal/data-on-eks that referenced this pull request Jan 31, 2024
    pick cd7ab810 Pre-commit updates
    pick 6e61ad4b Added requirements into add-ons and removed from default vcalues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants