Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controllers Interrupted During Reconcile Create Duplicate Resources #2062

Open
nromriell opened this issue Apr 24, 2024 · 4 comments
Open

Controllers Interrupted During Reconcile Create Duplicate Resources #2062

nromriell opened this issue Apr 24, 2024 · 4 comments
Labels
service/ec2 Indicates issues or PRs that are related to ec2-controller.

Comments

@nromriell
Copy link

nromriell commented Apr 24, 2024

Edited: Removed ec2 specification, Upon further investigation this would be true for all controllers

Describe the bug
During the reconcile of a VPC if the controller is interrupted before the reconcile completes, on restart the VPC is created a second time.

This is because the status on the VPC is empty and not updated to track the remote resource until after reconciliation is completed. This leads to orphaned duplicate resources

Steps to reproduce

  • Create a new VPC object
apiVersion: ec2.services.k8s.aws/v1alpha1
kind: VPC
metadata:
  name: example
spec:
  ...
  • As soon as the creation for the vpc is trigger on the remote, stop or restart the controller
  • Check the the status on the vpc, if the condition has triggered the status will still be empty with no reference to the remote
  • When the controller runs the reconcile again a new object with the same information will be created

This was found because of a panic on the controller causing a restart during the reconcile of the VPC causing the interrupt during creation

{"level":"info","ts":"2024-04-24T18:00:29.176Z","msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"elasticipaddress",...}

Expected outcome
VPC status is tracked as pending before final status update on crd after creation is complete. The controller would attempt to then reconcile with remote before creating a new object

Environment
EKS version: eks.15
Kubernetes Version: 1.27
Service Targeted: EC2
EC2 ACK Controller Version: 1.2.6

@nromriell
Copy link
Author

A potential option:

  • Update status to pending before running create api call
  • Automatically add initial tag to AWS resource matching uid of crd object manifest as part of api create call
  • During reconciles, if the status of an object is pending attempt to reconcile from remote first before just reverting to creation of new item

@nromriell
Copy link
Author

Found during crashloop caused by #2063

@a-hilaly
Copy link
Member

@nromriell Thank you for reporting this! I completely agree with everything you said above, and I really like the idea to use tags to track resources :) - with this said we will bring this during a community meeting today and discuss next steps.

@nromriell nromriell changed the title [EC2 VPC] Controller Interrupted During Reconcile Creates Duplicate Resources Controllers Interrupted During Reconcile Create Duplicate Resources Apr 25, 2024
@a-hilaly a-hilaly added the service/ec2 Indicates issues or PRs that are related to ec2-controller. label Apr 26, 2024
@nromriell
Copy link
Author

Thanks @a-hilaly for the quick triage and getting the discussion started during the community meeting today!

I've updated the title and added a note on the description that this would impact all controllers to reflect what I've found on further investigation. I'd love to see this get a resolution despite the initial cause being related to the crashloop. Let me know if you'd like me to take getting a proposal together for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service/ec2 Indicates issues or PRs that are related to ec2-controller.
Projects
None yet
Development

No branches or pull requests

2 participants