Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add simple fallback to kargo controller IAM when project-specific credentials cannot be assumed for GAR. #3597

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kacpercesarz98
Copy link

Add simple fallback to kargo controller IAM when project-specific credentials could not be assumed for Google Artifact Registry
closes: issue

@kacpercesarz98 kacpercesarz98 requested review from a team as code owners March 3, 2025 11:26
Copy link

netlify bot commented Mar 3, 2025

Deploy Preview for docs-kargo-io ready!

Name Link
🔨 Latest commit d15fda5
🔍 Latest deploy log https://app.netlify.com/sites/docs-kargo-io/deploys/67d19003ef89880008dcb5f9
😎 Deploy Preview https://deploy-preview-3597.docs.kargo.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kacpercesarz98 kacpercesarz98 changed the title Add simple fallback to kargo controller IAM. feat: add simple fallback to kargo controller IAM. Mar 3, 2025
@kacpercesarz98 kacpercesarz98 changed the title feat: add simple fallback to kargo controller IAM. feat: add simple fallback to kargo controller IAM when project-specific credentials cannot be assumed for GAR. Mar 3, 2025
@krancour krancour self-assigned this Mar 3, 2025
@krancour krancour added this to the v1.4.0 milestone Mar 3, 2025
@kacpercesarz98
Copy link
Author

kacpercesarz98 commented Mar 3, 2025

With this approach there is needed to fix or disable caching as using google.DefaultTokenSource do not always create fresh token. There are time windows when cached token is already expired.

Tested inside GKE.

Copy link

codecov bot commented Mar 4, 2025

Codecov Report

Attention: Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.

Project coverage is 52.61%. Comparing base (7d68abc) to head (b1799a6).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...als/kubernetes/gar/workload_identity_federation.go 0.00% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3597      +/-   ##
==========================================
- Coverage   52.63%   52.61%   -0.03%     
==========================================
  Files         296      296              
  Lines       26881    26891      +10     
==========================================
- Hits        14150    14149       -1     
- Misses      11957    11968      +11     
  Partials      774      774              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

},
).Do()
if err != nil {
logger.Error(err, "error generating access token")
return "", nil
logger.Error(err, "Error generating access token; falling back to Application Default Credentials (ADC)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be some effort here to determine whether the failure is an auth failure or something else. If it's (for instance) a connectivity issue, we should be returning that error right away.

If it is an auth failure, I also don't believe this should be logged as an error. In your own intended usage, this first token request failing and then falling back on this logic that attempts to use the controller's own identity directly is actually business as usual. You're not going to want to see the logs cluttered with "errors" that reflect things working as intended.

Comment on lines 139 to 170
if w.tokenSource == nil {
tokenSource, err := google.DefaultTokenSource(ctx, tokenRequestScope...)
if err != nil {
logger.Error(err, "Failed to find Application Default Credentials")
return "", false, nil
}
w.tokenSource = tokenSource
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not concurrency safe.

@krancour
Copy link
Member

krancour commented Mar 4, 2025

With this approach there is needed to fix or disable caching as using google.DefaultTokenSource do not always create fresh token. There are time windows when cached token is already expired.

This warrants some research, because I wonder if this is actually true or if this observation is actually an artifact of this concurrency safety issue:

https://github.com/akuity/kargo/pull/3597/files#r1979750397

@kacpercesarz98
Copy link
Author

With this approach there is needed to fix or disable caching as using google.DefaultTokenSource do not always create fresh token. There are time windows when cached token is already expired.

This warrants some research, because I wonder if this is actually true or if this observation is actually an artifact of this concurrency safety issue:

https://github.com/akuity/kargo/pull/3597/files#r1979750397

Please verify:
https://github.com/golang/oauth2/blob/master/oauth2.go#L371

@krancour
Copy link
Member

krancour commented Mar 4, 2025

Please verify:
https://github.com/golang/oauth2/blob/master/oauth2.go#L371

I agree that a single TokenSource, if its implementation is reuseTokenSource, has its own internal cache and will return the same token a number of times.

And through research and experimentation, I can verify that the TokenSource in question here is indeed a reuseTokenSource.

This can be demonstrated with a simple program:

package main

import (
	"context"
	"log"
	"os"

	"golang.org/x/oauth2/google"
)

const (
	credsPath         = "/Users/krancour/Downloads/kents-test-project-317520-e3fb3e962f2d.json" // nolint: gosec
	tokenRequestScope = "https://www.googleapis.com/auth/cloud-platform"                        // nolint: gosec
)

func init() {
	if err := os.Setenv("GOOGLE_APPLICATION_CREDENTIALS", credsPath); err != nil {
		log.Fatal(err)
	}
}

func main() {
	ctx := context.Background()
	tokenSource, err := google.DefaultTokenSource(ctx, tokenRequestScope)
	if err != nil {
		log.Fatal(err)
	}
	token1, err := tokenSource.Token()
	if err != nil {
		log.Fatal(err)
	}
	token2, err := tokenSource.Token()
	if err != nil {
		log.Fatal(err)
	}
	if token1.AccessToken == token2.AccessToken {
		log.Println("Tokens are the same")
	} else {
		log.Println("Tokens are different")
	}
}
go run main.go
2025/03/04 12:16:11 Tokens are the same

But two different instances of reuseTokenSource would return two different tokens, as demonstrated with some small tweaks to the above program:

func main() {
	ctx := context.Background()
	tokenSource1, err := google.DefaultTokenSource(ctx, tokenRequestScope)
	if err != nil {
		log.Fatal(err)
	}
	tokenSource2, err := google.DefaultTokenSource(ctx, tokenRequestScope)
	if err != nil {
		log.Fatal(err)
	}
	token1, err := tokenSource1.Token()
	if err != nil {
		log.Fatal(err)
	}
	token2, err := tokenSource2.Token()
	if err != nil {
		log.Fatal(err)
	}
	if token1.AccessToken == token2.AccessToken {
		log.Println("Tokens are the same")
		fmt.Println(token1.AccessToken)
	} else {
		log.Println("Tokens are different")
	}
}
go run main.go
2025/03/04 12:18:05 Tokens are different

Your code is currently resembles the first program because you're re-using a single TokenSource for all requests, however, you haven't written that in a concurrency safe manner. Admittedly, the lack of concurrency safety is a bit academic here, because the race to set the one-reused TokenSource won't have any adverse effects (beyond the side-effect of it invalidating this component's existing caching strategy).

If you were to tackle the concurrency safety issue, academic though it may be, by not reusing a single TokenSource, your code would more closely resemble the second example, but more importantly, it would have the side effect of preserving this component's existing caching strategy.

You could take a different approach to solving the concurrency safety issue, or we could just leave it alone, since I acknowledge that it's mainly academic in this instance. In either case, you'd be reusing the same TokenSource over and over, which does seem efficient since it has its own built-in cache, but again, we come back to the fact of this breaking the existing caching strategy...

So the thing I am asking myself is why I actually care about preserving the existing caching strategy. I like that the existing strategy is consistent for both controller-level and project-level tokens, which makes this code, overall, easier to comprehend and maintain.

Add simple fallback to kargo controller IAM when project-specific credentials could not be assumed for Google Artifact Registry

Signed-off-by: Kacper Cesarz <[email protected]>

improvements

Signed-off-by: Kacper Cesarz <[email protected]>
Signed-off-by: Kacper Cesarz <[email protected]>
@kacpercesarz98 kacpercesarz98 requested a review from krancour March 12, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google Artifact Registry credentials should be more flexible
2 participants