add core logic to support access token in postgres scaler #5589

Ferdinanddb · 2024-03-11T18:04:19Z

Provide a description of what has been changed
This PR purpose is to add the necessary code for the Postgres scaler to support Azure Access Token authentication.

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #5823

Relates to #

TODO:

Create an issue for this feature and edit this PR's description.
Write / Improve the documentation related to this feature.

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-03-11T19:06:26Z

Dear reviewers,

I have some doubts regarding the following topics and would appreciate assistance / guidance please:

(1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific PodIdentityProviderAzureWorkload tests? If yes, I am thinking about the following tests, based on what I wrote:

Check that the config is successfully parsed when using the podIdentity kedav1alpha1.PodIdentityProviderAzureWorkload.
Check that the Access Token (i.e. the password) is updated when it is expired or about to expire. This might be difficult because this part happens when performing the query, so it happens at "runtime" and it seems that the tests are not covering "runtime" behaviors, right?

(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?

I could also split the connection string into an array, replace the password entry, and then reconstruct the string, but I felt like regexp could do the same job or even better.

(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.

If it makes sense, I would say that this should be done in another PR, so that this one remains focus on the Postgres scaler.

Signed-off-by: Ferdinand de Baecque <[email protected]>

JorTurFer

(1) Should I change the logic of how an Azure Access Token is retrieved to be able to mock it and write some specific PodIdentityProviderAzureWorkload tests? If yes, I am thinking about the following tests, based on what I wrote:

I don't think so because this is something quite difficult to test with unit tests, but maybe we could introduce an e2e test for this scenario, WDYT? We can spin up a postgresql database in Azure with a managed identity user (we have another repo, testing-infrastucture, where we manage the infra from)

Check that the Access Token (i.e. the password) is updated when it is expired or about to expire. This might be difficult because this part happens when performing the query, so it happens at "runtime" and it seems that the tests are not covering "runtime" behaviors, right?

I don't think that this is a "real problem". Of course, handling it is always better, but in the worst case the scaler will fail, and it will trigger the scaler regeneration (during the same loop without printing any error) and that will regenerate the token (despite as I have said, managing it well is better)

(2) I used regexp pattern matching and replacement to find and replace the connection string and the DB connection, is it robust?

I could also split the connection string into an array, replace the password entry, and then reconstruct the string, but I felt like regexp could do the same job or even better.

I don't have any preference tbh, maybe to be 100% sure that i will always work not depending on the token, we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function. Using this approach, we can ensure that the regex will work (or event better, maybe not setting any password in case of pod identity)

(3) To be honest, I got inspired by both the Azure Blob and Azure pipelines scalers. The latter also uses an Access Token but with a different scope, so I am wondering if it could be a good idea to deduplicate and generalize the logic of generating an Azure Access Token to have it in one place.

If it makes sense, I would say that this should be done in another PR, so that this one remains focus on the Postgres scaler.

It's a good idea, but I think that it doesn't makes sense because we are migrating the Azure Scalers from current implementations to the new azure-sdk-for-go and that SDK uses a unified authentication approach (This PR is working on that)

pkg/scalers/postgresql_scaler.go

Ferdinanddb · 2024-03-12T12:23:52Z

@JorTurFer Thank you for your review!

Regarding your answers on my interrogations:

(1)

I don't think so because this is something quite difficult to test with unit tests, but maybe we could introduce an e2e test for this scenario, WDYT? We can spin up a postgresql database in Azure with a managed identity user (we have another repo, testing-infrastucture, where we manage the infra from)

I don't think that this is a "real problem". Of course, handling it is always better, but in the worst case the scaler will fail, and it will trigger the scaler regeneration (during the same loop without printing any error) and that will regenerate the token (despite as I have said, managing it well is better)

I was planning to create an Azure account and use free credits to test this (i.e. spin up an Azure Postgres Flexible Server, an AKS cluster where I install the KEDA helm chart using a container image built from this branch, an UAMI and all the other resources needed…) on my end but happy to try using the "testing-infrastucture" repo you mentioned!
The thing I find difficult to test is that the access token being generated can be set to be valid from 5 min to 1 hour, so it would mean that the e2e test would run for at least 5 minutes. Would that make sense?

(2)

I don't have any preference tbh, maybe to be 100% sure that i will always work not depending on the token, we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function. Using this approach, we can ensure that the regex will work (or event better, maybe not setting any password in case of pod identity)

From my understanding and based on my current implementation, we need to update s.metadata.connection (string) to re-create the s.connection (sql.DB) with the new token (password), this is why I did it that way and I use the getConnection function to replace/update the sql.DB's connection when the access token is expired or about to expire.
I don't really picturize your following idea "..., we could use a placeholder for the regex and instead of updating s.metadata.connection, using a variable scoped to the function."
- May I ask you to write a little snippet to showcase this, please?
Maybe I don't understand your last point in parentheses, but to be sure: are you proposing to not set the password='.....' part inside the s.metadata.connection string object but instead use the environment variable PGPASSWORD to store the access token (password)?
- That would be neat because it means that the s.connection sql.DB object does not need to be replaced if I am not wrong. But it means that the code would replace an environment variable (doable using os.Setenv("PGPASSWORD", accessToken)), WDYT?
  And we would also need another environment variable to store the access token's expiration date info of the token in order to replace it when needed.

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-03-13T20:09:17Z

Hey @JorTurFer,

I think I understand what you meant with your regexp placeholder idea (which is really nice btw) and just proposed the change to that into account.

I feel like some of the code I am adding / updating can still be written in a cleaner way though, and I still miss some unit tests regarding the changes, but I would like your opinion on what I changed to know if it is better than before, please :).

Thanks !

Signed-off-by: Ferdinand de Baecque <[email protected]>

JorTurFer · 2024-03-15T22:17:26Z

I think that the changes look nice! ❤️

stale · 2024-05-15T02:16:12Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer · 2024-05-18T09:06:06Z

@Ferdinanddb any update?

Signed-off-by: Ferdinand de Baecque <[email protected]>

…nnection before recreating it Signed-off-by: Ferdinand de Baecque <[email protected]>

…atement Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-05-21T10:43:48Z

Hi @JorTurFer , sorry I was busy with other things but I found the time to change some things in the PR and test it.

I tested my change within my own Azure subscription during the weekend, and it works, so I would say that the PR is ready to be reviewed :).

I deployed the KEDA resources using the Helm chart + custom container images built using this branch's code, and let the resources run for more than 24 hours because the Access Token in my case would expire after 24 hours, and it works.
I tested the change by deploying an Airflow cluster on an AKS cluster and creating an Azure Postgres Flexible Server, I had to adapt a bit the Airflow helm chart (specifically the part linked to using KEDA to scale the Airflow workers) but nothing rocket science.

One observation is that, during my test, I tried to use the PGBouncer feature offered by the Azure Postgres Flexible Server resource, and it is not working, I think it is somehow related to this issue.
But if I don't use the PGBouncer port which is "6432", so if I use the normal port of the server (i.e. "5432") then it works fine.

Another observation is regarding how Azure provides Access Token: if there is already an active Access Token and not yet expired, Azure will provide this Access Token until it expires. So I modified the part where we renew the Postgres connection to:

Verify if the Access Token expired by comparing its ExpireOn value to Time.Now(). If the Access Token expired:
- the Postgres connection object is closed,
- a new Access Token is retrieved,
- a new Postgres connection object is created and replaces the previous connection reference.

What do you think about this?

TODO:

Reference this change in the CHANGELOG.md file,
Modify the doc to document this new way of connecting to an Azure Postgres Flex server,
Maybe add more tests?

Thank you !

Signed-off-by: Ferdinand de Baecque <[email protected]>

JorTurFer

The implementation looks nice!
Could you add an e2e test for this new scenario? I think that it'll be super useful for the future.

The new infra (the postgresql) can be created via PR to this repo using terraform: https://github.com/kedacore/testing-infrastructure and the test itself can be a copycut of any of these (adding the change in the TriggerAuthentication): https://github.com/kedacore/keda/tree/main/tests/scalers/postgresql

Ferdinanddb · 2024-05-27T16:34:00Z

@JorTurFer Thanks for your feedback!

Got you, I will try to propose something before next week, to resume:

Create or update Terraform modules regarding the following cloud resources in this repo:
- Update Managed Identity module to add an Azure User Managed Identity,
- Update the AKS module to add a federated identity,
- New module to create an Azure Postgres Flexible Server.
Add an e2e test scenario for this feature in this repo:
- I will duplicate the HA test and adapt specific parts so that it uses this new feature.

Does that sound good?

Also, once that is done, I plan to propose a PR to update the doc regarding this feature.

Ferdinanddb · 2024-05-29T22:09:27Z

For info, I just created this issue and this PR to add a new module to create an Azure Postgres Flexible Server.

And I will add an E2E test on this branch soon.

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-06-04T17:53:31Z

@JorTurFer I added an e2e test in this PR which is very similar to the Postgres standalone e2e test.

The logic is the following:

Create a local PostgreSQL server as it is done in the standalone test,
Create a table on the (remote) Azure Postgres Flexible server,
- The creation is done by connecting to the pod of the local PostgreSQL server and run a psql command to be executed on the remote Azure Postgres Flexible server (using the admin username and password).
Perform the same test actions as in the standalone test (activation, scale-out, scale-in).
- The scripts used to add rows in the table use the admin credentials, but the ScaledObject uses the UAMI identity.
Drop the (remote) PostgreSQL table and destroy the local PostgreSQL server.

I did that because the test needs to run psql commands, and this can be done from a Postgres server's pod.
So, no data manipulation actions will happen on the local PostgreSQL server, but it will be the candidate resource that will see its number of replicas being scaled in and out.

Please let me know if that makes sense!

Remark:

Unfortunately, I was not yet able to run this e2e test locally because I don't really know how to do it (and I did not search a lot to be honest). But I will try to find the time for this soon!
Is there a doc describing how to run a single e2e test, by any chance?
- What I don't really understand is that the test needs a "custom" container image (at least in this case) and I didn't find where the build (and eventually the push) of the image is done.
- If no doc, then I can also try to write a little one explaining how to do it, and it could also include what I did for creating the Azure resources via IaC from this PR.

zroubalik · 2024-06-05T11:03:02Z

/run-e2e azure_postgresql
Update: You can check the progress here

zroubalik · 2024-06-05T11:05:26Z

Unfortunately, I was not yet able to run this e2e test locally because I don't really know how to do it (and I did not search a lot to be honest). But I will try to find the time for this soon!

Is there a doc describing how to run a single e2e test, by any chance?

https://github.com/kedacore/keda/blob/main/tests/README.md#specific-test

What I don't really understand is that the test needs a "custom" container image (at least in this case) and I didn't find where the build (and eventually the push) of the image is done.

If no doc, then I can also try to write a little one explaining how to do it, and it could also include what I did for creating the Azure resources via IaC from this PR.

Custom images are needed when the e2e tests uses some custom applications or something that we need to build, in this case we want to have the source code in our repo:

https://github.com/kedacore/test-tools/tree/main/e2e

Ferdinanddb · 2024-06-05T11:06:33Z

@zroubalik thank you, that is good to know! The e2e test might not work because this PR has not been merged yet.

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-06-08T22:12:21Z

I was able to run the e2e test locally after creating the Terraform resources and setting up KEDA with workload identity with my own image (thanks to the doc links inside @zroubalik 's message) !

Once this PR is merged then the e2e tests of this CICD should work. Here is the output I got when running this new e2e test locally:

# go test -v -tags e2e ./scalers/postgresql/azure_postgresql_flex_server_aad_wi/azure_postgresql_flex_server_aad_wi_test.go 
=== RUN   TestPostreSQLScaler
    helper.go:246: deleting namespace azure-postgresql-test-ns
    helper.go:299: waiting for namespace azure-postgresql-test-ns deletion
    helper.go:233: Creating namespace - azure-postgresql-test-ns
    helper.go:548: Applying template: postgreSQLStatefulSetTemplate
    helper.go:462: Waiting for statefulset replicas to hit target. Statefulset - azure-postgresql, Current  - 0, Target - 1
    helper.go:462: Waiting for statefulset replicas to hit target. Statefulset - azure-postgresql, Current  - 0, Target - 1
    helper.go:462: Waiting for statefulset replicas to hit target. Statefulset - azure-postgresql, Current  - 0, Target - 1
    helper.go:462: Waiting for statefulset replicas to hit target. Statefulset - azure-postgresql, Current  - 0, Target - 1
    helper.go:462: Waiting for statefulset replicas to hit target. Statefulset - azure-postgresql, Current  - 1, Target - 1
    helper.go:187: Waiting for successful execution of command on Pod; Output: NOTICE:  table "task_instance" does not exist, skipping
        DROP TABLE
        , Error: 
    helper.go:187: Waiting for successful execution of command on Pod; Output: CREATE TABLE
        , Error: 
    helper.go:187: Waiting for successful execution of command on Pod; Output: GRANT
        , Error: 
    helper.go:548: Applying template: azureSecretTemplate
    helper.go:548: Applying template: deploymentTemplate
    helper.go:548: Applying template: azureTriggerAuthenticationTemplate
    helper.go:548: Applying template: azureScaledObjectTemplate
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 0, Target - 0
    azure_postgresql_flex_server_aad_wi_test.go:178: --- testing activation ---
    helper.go:596: Applying template: lowLevelRecordsJobTemplate
    helper.go:500: Waiting for some time to ensure deployment replica count doesn't change from 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    helper.go:507: Deployment - azure-postgresql-test-deployment, Current  - 0
    azure_postgresql_flex_server_aad_wi_test.go:185: --- testing scale out ---
    helper.go:596: Applying template: insertRecordsJobTemplate
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 0, Target - 2
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 0, Target - 2
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 1, Target - 2
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 1, Target - 2
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 1, Target - 2
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 1, Target - 2
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 2
    azure_postgresql_flex_server_aad_wi_test.go:193: --- testing scale in ---
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 2, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 1, Target - 0
    helper.go:442: Waiting for deployment replicas to hit target. Deployment - azure-postgresql-test-deployment, Current  - 0, Target - 0
    helper.go:187: Waiting for successful execution of command on Pod; Output: DROP TABLE
        , Error: 
    helper.go:617: Deleting template: azureScaledObjectTemplate
    helper.go:617: Deleting template: azureTriggerAuthenticationTemplate
    helper.go:617: Deleting template: deploymentTemplate
    helper.go:617: Deleting template: azureSecretTemplate
    helper.go:617: Deleting template: postgreSQLStatefulSetTemplate
    helper.go:246: deleting namespace azure-postgresql-test-ns
    helper.go:299: waiting for namespace azure-postgresql-test-ns deletion
    helper.go:299: waiting for namespace azure-postgresql-test-ns deletion
    helper.go:299: waiting for namespace azure-postgresql-test-ns deletion
--- PASS: TestPostreSQLScaler (167.59s)
PASS
ok      command-line-arguments  167.607s

Ferdinanddb · 2024-06-10T20:56:39Z

I have a remark regarding my e2e test and I would like your opinions please:

The way I grant permissions on the table to the Azure identity is subject to SQL injection if someone is able to change the value of azurePostgreSQLAdminUsername = os.Getenv("TF_AZURE_POSTGRES_ADMIN_USERNAME").

If the value of this becomes "; DROP DATABASE postgres ;" then we have a problem because we will have:

grantPrivilegesSQL := fmt.Sprintf(`GRANT ALL ON task_instance TO \"%s\";`, azurePostgreSQLUamiName)
// This will become GRANT ALL ON task_instance TO ""; DROP DATABASE postgres ;"";

The solutions I see for this:

Hard-code the value of azurePostgreSQLUamiName.
- This will work until the identity name, that is created by Terraform, gets modified.

Add a regex check prior to this part to check that the value of azurePostgreSQLAdminUsername respects a specific pattern like 1231r-1f111-11f31d-1d1q.
- This can work but is a bit restrictive since a username (identity name) could be abc123 or abc_123... But something that is sure is that the character ; should not be present.

Add a check prior to this part to get the users listed in the server, and since the user we want to grant permissions to will be one of the users returned by the query, meaning if azurePostgreSQLAdminUsername is present in the query result, then we can proceed.
- To query users listed in the server, we can use the following query select * from pgaadauth_list_principals(true); .
- If the value of azurePostgreSQLAdminUsername is present in the result, then we can go further, else there is a problem.

Another (more complex) solution could be to manage the permissions in Terraform directly using null_resource resources and local-exec provisioners, but it means that:
- The Azure identity that executes Terraform needs to be granted Entra AD Admin role on the PG server.
- A firewall rule needs to be added to authorize connection coming from the IP or range of IPs of the VM where the code of GitHub Actions is executed. This is not needed if the code of GitHub Actions is executed from an Azure machine (and so has an Azure IP).
- One or several null_resource with local-exec to connect to the pg server and runs different queries for:
  - Authorize one or many Azure identities to connect to the server (in this case we just need one Azure identity, but maybe more in the future).
  - Grant permissions at the server, database or table level to the previously added Azure identities.
  - Revoke permissions and/or remove an Azure identity if it is not listed in Terraform anymore.

The benefit of this last solution is that basic authentication isn't needed anymore, and imho this is a good way of tackling the problem at a high scale. But it is more complex.

zroubalik · 2024-06-10T20:58:57Z

/run-e2e azure_postgresql
Update: You can check the progress here

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-06-10T23:48:08Z

The error from the previous /run-e2e azure_postgresqlcommand might be due to the fact that I added some entries in the test/.env file, I just pushed a change to remove my modification on this file and see if it solves the issue.
My guess is that the godotenv.Load("./../../.env") was replacing the variables' values by "".

Another possibility could be that the PGPASSWORD and the -h values need to be quoted, but I did not have this issue locally and the password is generated such that it doesn't contain special characters.

And another possibility is that the values should be referenced inside CICD pipeline definition file v1-build.yml (or similar) to look something like:

...
 - name: Run end to end tests
        env:
          TF_AZURE_SUBSCRIPTION: ${{ secrets.TF_AZURE_SUBSCRIPTION }}
          TF_AZURE_RESOURCE_GROUP: ${{ secrets.TF_AZURE_RESOURCE_GROUP }}
          TF_AZURE_SP_APP_ID: ${{ secrets.TF_AZURE_SP_APP_ID }}
          AZURE_SP_KEY: ${{ secrets.AZURE_SP_KEY }}
          TF_AZURE_SP_TENANT: ${{ secrets.TF_AZURE_SP_TENANT }}
          TF_AZURE_STORAGE_CONNECTION_STRING: ${{ secrets.TF_AZURE_STORAGE_CONNECTION_STRING }}
          TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID: ${{ secrets.TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID }}
          # Add the following
          TF_AZURE_POSTGRES_ADMIN_USERNAME: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_USERNAME }}
          TF_AZURE_POSTGRES_ADMIN_PASSWORD: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_PASSWORD }}
          TF_AZURE_POSTGRES_FQDN: ${{ secrets.TF_AZURE_POSTGRES_FQDN }}
          TF_AZURE_POSTGRES_DB_NAME: ${{ secrets.TF_AZURE_POSTGRES_DB_NAME }}
        run: make e2e-test

What do you think?
If the first scenario is actually the cause of issue, could you rerun the command please @zroubalik ?

zroubalik · 2024-06-11T11:05:19Z

/run-e2e azure_postgresql
Update: You can check the progress here

zroubalik · 2024-06-11T11:07:20Z

I have a remark regarding my e2e test and I would like your opinions please:

The way I grant permissions on the table to the Azure identity is subject to SQL injection if someone is able to change the value of azurePostgreSQLAdminUsername = os.Getenv("TF_AZURE_POSTGRES_ADMIN_USERNAME").

If the value of this becomes "; DROP DATABASE postgres ;" then we have a problem because we will have:

grantPrivilegesSQL := fmt.Sprintf(`GRANT ALL ON task_instance TO \"%s\";`, azurePostgreSQLUamiName)
// This will become GRANT ALL ON task_instance TO ""; DROP DATABASE postgres ;"";

The solutions I see for this:

* Hard-code the value of `azurePostgreSQLUamiName`.
  
  * This will work until the identity name, that is created by Terraform, gets modified.


* Add a regex check prior to this part to check that the value of `azurePostgreSQLAdminUsername` respects a specific pattern like `1231r-1f111-11f31d-1d1q`.
  
  * This can work but is a bit restrictive since a username (identity name) could be `abc123` or `abc_123`... But something that is sure is that the character `;` should not be present.


* Add a check prior to this part to get the users listed in the server, and since the user we want to grant permissions to will be one of the users returned by the query, meaning if `azurePostgreSQLAdminUsername` is present in the query result, then we can proceed.
  
  * To query users listed in the server, we can use the following query `select * from pgaadauth_list_principals(true); `.
  * If the value of `azurePostgreSQLAdminUsername` is present in the result, then we can go further, else there is a problem.


* Another (more complex) solution could be to manage the permissions in Terraform directly using `null_resource` resources and `local-exec` provisioners, but it means that:
  
  * The Azure identity that executes Terraform needs to be granted Entra AD Admin role on the PG server.
  * A firewall rule needs to be added to authorize connection coming from the IP or range of IPs of the VM where the code of GitHub Actions is executed. This is not needed if the code of GitHub Actions is executed from an Azure machine (and so has an Azure IP).
  * One or several `null_resource` with `local-exec` to connect to the pg server and runs different queries for:
    
    * Authorize one or many Azure identities to connect to the server (in this case we just need one Azure identity, but maybe more in the future).
    * Grant permissions at the server, database or table level to the previously added Azure identities.
    * Revoke permissions and/or remove an Azure identity if it is not listed in Terraform anymore.

The benefit of this last solution is that basic authentication isn't needed anymore, and imho this is a good way of tackling the problem at a high scale. But it is more complex.

I would personally prefer some simple solution, if we are talking about e2e. In general, we would like to have reproducible isolated e2e test runs.

But I'd love to hear @JorTurFer's opinion here

Ferdinanddb · 2024-06-11T13:15:53Z

I see that the e2e test failed again :/ .

The error is:

psql: warning: extra command-line argument "54***" ignored
        psql: could not translate host name "-p" to address: Name or service not known

There are 2 things:

A command line argument is not considered, and I guess this is the part PGPASSWORD=%s psql ....
The azurePostgreSQLFQDN is empty (but why) so psql -h %s -p 5432 becomes psql -h -p 5432.

I thought about the following:

The PGPASSWORD and the -h values need to be quoted, but I did not have this issue locally and the password is generated such that it doesn't contain special characters.
The env variables values should be referenced inside CICD pipeline definition file(s) like v1-build.yml (or similar) to look something like:

...
 - name: Run end to end tests
        env:
          TF_AZURE_SUBSCRIPTION: ${{ secrets.TF_AZURE_SUBSCRIPTION }}
          TF_AZURE_RESOURCE_GROUP: ${{ secrets.TF_AZURE_RESOURCE_GROUP }}
          TF_AZURE_SP_APP_ID: ${{ secrets.TF_AZURE_SP_APP_ID }}
          AZURE_SP_KEY: ${{ secrets.AZURE_SP_KEY }}
          TF_AZURE_SP_TENANT: ${{ secrets.TF_AZURE_SP_TENANT }}
          TF_AZURE_STORAGE_CONNECTION_STRING: ${{ secrets.TF_AZURE_STORAGE_CONNECTION_STRING }}
          TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID: ${{ secrets.TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID }}
          # Add the following
          TF_AZURE_POSTGRES_ADMIN_USERNAME: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_USERNAME }}
          TF_AZURE_POSTGRES_ADMIN_PASSWORD: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_PASSWORD }}
          TF_AZURE_POSTGRES_FQDN: ${{ secrets.TF_AZURE_POSTGRES_FQDN }}
          TF_AZURE_POSTGRES_DB_NAME: ${{ secrets.TF_AZURE_POSTGRES_DB_NAME }}
        run: make e2e-test

What do you think about this?

Ferdinanddb · 2024-06-11T17:59:16Z

I just did again the whole test on my personal machine and it works, what I did is the following:

# At the root folder of my KEDA GitHub repo, build and publish the images
IMAGE_REGISTRY=docker.io IMAGE_REPO=ferdi7 make publish-multiarch

# In the mean time, inside my testing-infrastructure GitHub repo, I did the necessary to create the resources via Terraform
# (.i.e an AKS cluster and the Azure Postgres Flexible Server)

# Login to Azure and get AKS creds
az login
az aks get-credentials --resource-group MY-RG-NAME --name AKS-CLUSTER-NAME --overwrite-existing

# Export the following env variables to install KEDA and run the e2e test (I did not use any quotes)
export TF_AZURE_POSTGRES_ADMIN_USERNAME=USERNAME
export TF_AZURE_POSTGRES_ADMIN_PASSWORD=PASSWRD
export TF_AZURE_POSTGRES_FQDN=SERVER-NAME.postgres.database.azure.com
export TF_AZURE_POSTGRES_DB_NAME=DB_NAME
export TF_AZURE_SP_TENANT=AZURE-TENANT-ID
export TF_AZURE_IDENTITY_1_APP_ID=UAMI-IDENTITY-CLIENT-ID
export TF_AZURE_IDENTITY_1_NAME=UAMI-IDENTITY-NAME
export AZURE_RUN_WORKLOAD_IDENTITY_TESTS=true

# Deploy KEDA and run e2e test
cd tests
IMAGE_REGISTRY=docker.io IMAGE_REPO=ferdi7 go test -v -tags e2e ./utils/setup_test.go
go test -v -tags e2e ./scalers/postgresql/azure_postgresql_flex_server_aad_wi/azure_postgresql_flex_server_aad_wi_test.go

# Uninstall KEDA
go test -v -tags e2e ./utils/cleanup_test.go

This works on my end, so I have 2 remarks which could explain the error:

Remark 1: The environment variables provisioning is handled by Terraform as follows, for example:

...
    {
      name  = "TF_AZURE_POSTGRES_ADMIN_USERNAME"
      value = module.azurerm_postgres_flexible_server.admin_username
    },
    {
      name  = "TF_AZURE_POSTGRES_ADMIN_PASSWORD"
      value = module.azurerm_postgres_flexible_server.admin_password
    },
...

But I just realized that the admin_password variable is sensitive! So maybe something weird happen, and we should have the following if that makes sense (notice the use of the Terraform function nonsensitive()):

...
    {
      name  = "TF_AZURE_POSTGRES_ADMIN_USERNAME"
      value = module.azurerm_postgres_flexible_server.admin_username
    },
    {
      name  = "TF_AZURE_POSTGRES_ADMIN_PASSWORD"
      value = nonsensitive(module.azurerm_postgres_flexible_server.admin_password)
    },
...

Remark 2: The other thing (or in addition to the previous remark) that could explain the error is what I said in a previous comment:

The env variables values should be referenced inside a CICD pipeline definition file(s) like v1-build.yml (or similar) to look something like:

...
 - name: Run end to end tests
        env:
          TF_AZURE_SUBSCRIPTION: ${{ secrets.TF_AZURE_SUBSCRIPTION }}
          TF_AZURE_RESOURCE_GROUP: ${{ secrets.TF_AZURE_RESOURCE_GROUP }}
          TF_AZURE_SP_APP_ID: ${{ secrets.TF_AZURE_SP_APP_ID }}
          AZURE_SP_KEY: ${{ secrets.AZURE_SP_KEY }}
          TF_AZURE_SP_TENANT: ${{ secrets.TF_AZURE_SP_TENANT }}
          TF_AZURE_STORAGE_CONNECTION_STRING: ${{ secrets.TF_AZURE_STORAGE_CONNECTION_STRING }}
          TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID: ${{ secrets.TF_AZURE_LOG_ANALYTICS_WORKSPACE_ID }}
          # Add the following
          TF_AZURE_POSTGRES_ADMIN_USERNAME: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_USERNAME }}
          TF_AZURE_POSTGRES_ADMIN_PASSWORD: ${{ secrets.TF_AZURE_POSTGRES_ADMIN_PASSWORD }}
          TF_AZURE_POSTGRES_FQDN: ${{ secrets.TF_AZURE_POSTGRES_FQDN }}
          TF_AZURE_POSTGRES_DB_NAME: ${{ secrets.TF_AZURE_POSTGRES_DB_NAME }}
        run: make e2e-test

What do you think about this?

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb · 2024-06-23T08:58:15Z

@zroubalik @JorTurFer I just pushed a new commit to reference the environment variables that are used in the e2e tests in the file v1-build.yml

May I ask you if you can write the comment "/run-e2e azure_postgresql" to execute the e2e tests, please?

Thank you!

add core logic to support access token in postgres scaler

6354167

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb added 2 commits March 11, 2024 20:19

minor fix

b2855ec

Signed-off-by: Ferdinand de Baecque <[email protected]>

run make build to fmt code

f3760a5

Signed-off-by: Ferdinand de Baecque <[email protected]>

JorTurFer reviewed Mar 11, 2024

View reviewed changes

pkg/scalers/postgresql_scaler.go Outdated Show resolved Hide resolved

Ferdinanddb added 2 commits March 13, 2024 20:33

make regexp password pattern global

bee72e1

Signed-off-by: Ferdinand de Baecque <[email protected]>

adapt to use placeholder for regexp

de5b4ae

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb added 5 commits March 13, 2024 21:11

add missing authPodIdentity variable

1a4e372

Signed-off-by: Ferdinand de Baecque <[email protected]>

Merge branch 'main' into feature/support-access-token-postgres-scaler

fc778ca

lint code using gci write... command

d3b0456

Signed-off-by: Ferdinand de Baecque <[email protected]>

lint import + add 2 unite tests

1885608

Signed-off-by: Ferdinand de Baecque <[email protected]>

lint with make fmt

ce1c2d1

Signed-off-by: Ferdinand de Baecque <[email protected]>

stale bot added the stale All issues that are marked as stale due to inactivity label May 15, 2024

stale bot removed the stale All issues that are marked as stale due to inactivity label May 18, 2024

Ferdinanddb added 4 commits May 18, 2024 19:58

Merge branch 'main' into feature/support-access-token-postgres-scaler

c70e464

remove podIdentityAzure references (but keep AzureWorkload ones)

1034b6c

Signed-off-by: Ferdinand de Baecque <[email protected]>

replace switch by if statements + fix error when comparing + close co…

5d9bcee

…nnection before recreating it Signed-off-by: Ferdinand de Baecque <[email protected]>

generate a new token if the current one has expired + add log info st…

acd9e28

…atement Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb marked this pull request as ready for review May 21, 2024 10:20

Ferdinanddb requested a review from a team as a code owner May 21, 2024 10:20

Merge branch 'main' into feature/support-access-token-postgres-scaler

ff31521

Ferdinanddb mentioned this pull request May 23, 2024

Add support for access token authentication to an Azure Postgres Flexible Server - Postgres scaler #5823

Open

minor change + add entry in CHANGELOG.md

239b8e2

Signed-off-by: Ferdinand de Baecque <[email protected]>

JorTurFer reviewed May 26, 2024

View reviewed changes

Merge branch 'main' into feature/support-access-token-postgres-scaler

6abe03b

Ferdinanddb added 7 commits May 31, 2024 23:28

Merge branch 'main' into feature/support-access-token-postgres-scaler

f407153

Add first draft of an e2e test

6fd7dfa

Signed-off-by: Ferdinand de Baecque <[email protected]>

Add comment and change package name

9ed7d39

Signed-off-by: Ferdinand de Baecque <[email protected]>

fix golanci lint

432c280

Signed-off-by: Ferdinand de Baecque <[email protected]>

use identity 1 in e2e tests

3783d1c

Signed-off-by: Ferdinand de Baecque <[email protected]>

Merge branch 'main' into feature/support-access-token-postgres-scaler

b358784

Merge branch 'main' into feature/support-access-token-postgres-scaler

01471e6

Ferdinanddb added 3 commits June 8, 2024 19:16

Merge branch 'main' into feature/support-access-token-postgres-scaler

cc5636c

fix e2e tests after testing it + change .env file

718d7de

Signed-off-by: Ferdinand de Baecque <[email protected]>

go fmt

d0a0455

Signed-off-by: Ferdinand de Baecque <[email protected]>

remove entries in .env file

9bc3a6e

Signed-off-by: Ferdinand de Baecque <[email protected]>

Ferdinanddb added 2 commits June 22, 2024 20:40

Merge branch 'main' into feature/support-access-token-postgres-scaler

85543de

Add Postgres env variables

0fb1288

Signed-off-by: Ferdinand de Baecque <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add core logic to support access token in postgres scaler #5589

add core logic to support access token in postgres scaler #5589

Ferdinanddb commented Mar 11, 2024 •

edited by JorTurFer

Ferdinanddb commented Mar 11, 2024

JorTurFer left a comment

Ferdinanddb commented Mar 12, 2024

Ferdinanddb commented Mar 13, 2024

JorTurFer commented Mar 15, 2024

stale bot commented May 15, 2024

JorTurFer commented May 18, 2024

Ferdinanddb commented May 21, 2024 •

edited

JorTurFer left a comment •

edited

Ferdinanddb commented May 27, 2024

Ferdinanddb commented May 29, 2024

Ferdinanddb commented Jun 4, 2024

zroubalik commented Jun 5, 2024 •

edited by github-actions bot

zroubalik commented Jun 5, 2024

Ferdinanddb commented Jun 5, 2024

Ferdinanddb commented Jun 8, 2024

Ferdinanddb commented Jun 10, 2024

zroubalik commented Jun 10, 2024 •

edited by github-actions bot

Ferdinanddb commented Jun 10, 2024 •

edited

zroubalik commented Jun 11, 2024 •

edited by github-actions bot

zroubalik commented Jun 11, 2024

Ferdinanddb commented Jun 11, 2024

Ferdinanddb commented Jun 11, 2024

Ferdinanddb commented Jun 23, 2024 •

edited

add core logic to support access token in postgres scaler #5589

Are you sure you want to change the base?

add core logic to support access token in postgres scaler #5589

Conversation

Ferdinanddb commented Mar 11, 2024 • edited by JorTurFer

Checklist

Ferdinanddb commented Mar 11, 2024

JorTurFer left a comment

Choose a reason for hiding this comment

Ferdinanddb commented Mar 12, 2024

Ferdinanddb commented Mar 13, 2024

JorTurFer commented Mar 15, 2024

stale bot commented May 15, 2024

JorTurFer commented May 18, 2024

Ferdinanddb commented May 21, 2024 • edited

JorTurFer left a comment • edited

Choose a reason for hiding this comment

Ferdinanddb commented May 27, 2024

Ferdinanddb commented May 29, 2024

Ferdinanddb commented Jun 4, 2024

zroubalik commented Jun 5, 2024 • edited by github-actions bot

zroubalik commented Jun 5, 2024

Ferdinanddb commented Jun 5, 2024

Ferdinanddb commented Jun 8, 2024

Ferdinanddb commented Jun 10, 2024

zroubalik commented Jun 10, 2024 • edited by github-actions bot

Ferdinanddb commented Jun 10, 2024 • edited

zroubalik commented Jun 11, 2024 • edited by github-actions bot

zroubalik commented Jun 11, 2024

Ferdinanddb commented Jun 11, 2024

Ferdinanddb commented Jun 11, 2024

Remark 1: The environment variables provisioning is handled by Terraform as follows, for example:

Remark 2: The other thing (or in addition to the previous remark) that could explain the error is what I said in a previous comment:

Ferdinanddb commented Jun 23, 2024 • edited

Ferdinanddb commented Mar 11, 2024 •

edited by JorTurFer

Ferdinanddb commented May 21, 2024 •

edited

JorTurFer left a comment •

edited

zroubalik commented Jun 5, 2024 •

edited by github-actions bot

zroubalik commented Jun 10, 2024 •

edited by github-actions bot

Ferdinanddb commented Jun 10, 2024 •

edited

zroubalik commented Jun 11, 2024 •

edited by github-actions bot

Ferdinanddb commented Jun 23, 2024 •

edited