-
Notifications
You must be signed in to change notification settings - Fork 640
taskresource: wait for execution role credentials upon agent restart #4827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8372714 to
5ae4c0b
Compare
446d0c2 to
b6b3a0a
Compare
b6b3a0a to
fa40298
Compare
fa40298 to
cb0b3da
Compare
cb0b3da to
9e4f457
Compare
9e4f457 to
d382456
Compare
d382456 to
0289725
Compare
0289725 to
7e34d5c
Compare
| // RequiresExecutionRoleCredentials returns true if the resource requires execution role credentials. | ||
| // We only need execution role credentials when there's an external config to pull from S3 | ||
| func (firelens *FirelensResource) RequiresExecutionRoleCredentials() bool { | ||
| return firelens.externalConfigType == ExternalConfigTypeS3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not fallback to instance credentials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, but today it does not:
amazon-ecs-agent/agent/taskresource/firelens/firelens_unix.go
Lines 519 to 522 in 64b39fa
| creds, ok := firelens.credentialsManager.GetTaskCredentials(firelens.executionCredentialsID) | |
| if !ok { | |
| return errors.New("unable to get execution role credentials") | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider adding the fallback logic as a separate enhancement
Summary
This PR fixes the following race condition in ECS agent -
When the ECS agent receives a task payload from ACS, the payload contains the execution role credentials. The ECS agent uses these credentials to create task resources, before the task can be run.
If the ECS agent is restarted between the time when a task payload arrived and the task's resources were created, currently the task fails with error like "unable to find execution role credentials".
This is because ECS agent only saves the credentials ID to its state during shutdown, not the credentials themselves. So after a restart, ECS agent needs to wait for the execution role credentials to (re)arrive from ACS, instead of trying to progress the task too soon and failing.
Implementation details
Updated the task manager's resource transition logic to check whether the execution role is needed for a resource creation, and if so, whether it is available. If not, it will wait, like it waits for execution role credentials to pull container images of a task. The wait timeout is configured to be 1 minute.
Also, added a
RequiresExecutionRoleCredentialsmethod to the taskresource interface that tells whether a resource requires execution role credentials. This is used by the task manager during resource transitions.Testing
New tests cover the changes: yes, added unit and integ tests. The integ test reproduces the race condition and now asserts that the fix works.
Description for the changelog
bugfix: wait for execution role credentials to create task resource upon agent restart
Additional Information
Does this PR include breaking model changes? If so, Have you added transformation functions? No
Does this PR include the addition of new environment variables in the README? No
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.