-
Notifications
You must be signed in to change notification settings - Fork 9
CLOUDP-324440 Save rs member ids #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
add6a32
to
8ae690c
Compare
f76fc3a
to
1da9a93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue with migrating a MongoDBMultiCluster resource to a new project by saving and restoring replica set member IDs in an annotation. It also updates test cases and refactors various functions to use the new GetReplicaSets naming convention.
- Added a new constant for the replica set member IDs annotation.
- Updated reconciliation logic to save and use member IDs during project changes.
- Refactored tests and Om client methods for consistent API usage.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
pkg/util/constants.go | Introduces a new constant for saving RS member IDs. |
docker/mongodb-kubernetes-tests/tests/multicluster/multi_cluster_scale_up_cluster.py | Updates test scenarios to account for scaled member counts and project changes. |
controllers/operator/mongodbmultireplicaset_controller.go | Adds logic to retrieve and save RS member IDs from OM and updates the signature for saving the last achieved spec. |
controllers/operator/mongodbmultireplicaset_controller_test.go | Adds new test cases to verify annotation persistence and proper member ID usage. |
controllers/om/* | Adds new methods (e.g., MemberIds and GetReplicaSetMemberIds) and updates references to use the new GetReplicaSets API. |
RELEASE_NOTES.md | Documents the bug fix related to project migration for MongoDBMultiCluster resources. |
Comments suppressed due to low confidence (1)
controllers/operator/mongodbmultireplicaset_controller.go:815
- [nitpick] The inner variable 'processIds' is a nested map; consider renaming it (for example to 'allMemberIds') to more clearly reflect its structure and improve code readability.
func getReplicaSetProcessIdsFromAnnotation(mrs mdbmultiv1.MongoDBMultiCluster) map[string]int {
if err != nil { | ||
return err | ||
} | ||
if string(rsMemberIdsBytes) != "null" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider checking if rsMemberIds is nil or if the marshalled byte slice is empty instead of comparing to the string "null", as this would improve clarity and robustness if the serialization behavior changes.
if string(rsMemberIdsBytes) != "null" { | |
if rsMemberIds != nil && len(rsMemberIdsBytes) > 0 { |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: oh - yeah definitely please don't check for a null string. I think in other code places we have a better way of doing that
@@ -25,6 +25,9 @@ | |||
* [Manage Database Users using OIDC](https://www.mongodb.com/docs/kubernetes/upcoming/manage-users/) # TODO | |||
* [Authentication and Authorization with OIDC/OAuth 2.0](https://www.mongodb.com/docs/manual/core/oidc/security-oidc/) | |||
|
|||
## Bug Fixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please highlight better the introduction of OIDC? Checking the release notes, I see the new features but our main goal for this release is the OIDC, so I would suggest to add a section for OIDC. I will create a suggestion in GDocs and share with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep this PR focused on the bug fix. I think we can do this refactoring when we release. Wdyt?
@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte | |||
} | |||
|
|||
processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment) | |||
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation | |||
if len(processIds) == 0 { | |||
processIds = getReplicaSetProcessIdsFromAnnotation(mrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use conn.GetReplicaSetMemberIds()
here instead? Couldn't that save us the whole annotation work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline - its the migration case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucian-tosa Can you please explain the "migration case" that requires use of annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: let's add a comment here on why this can happen (the migration case) for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally - awesome fix! 👏
But I've left one blocking comment regarding not adding another method to the OM's connection interface. Consider the rest optional.
@@ -273,6 +277,21 @@ func (oc *HTTPOmConnection) GetAgentAuthMode() (string, error) { | |||
return ac.Auth.AutoAuthMechanism, nil | |||
} | |||
|
|||
func (oc *HTTPOmConnection) GetReplicaSetMemberIds() (map[string]map[string]int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: Do we need that separate method? Could we have an external function calling ReadDeployment and extracting the ids from the result?
I'd prefer to limit the interface of OMConnection to the distinct API operations and have them combined for the logic like this in a different layer above it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this we'd also avoic changing the mock's implementation
if err != nil { | ||
return err | ||
} | ||
if string(rsMemberIdsBytes) != "null" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion.
processIds := make(map[string]map[string]int) | ||
if processIdsStr, ok := mrs.Annotations[util.LastAchievedRsMemberIds]; ok { | ||
if err := json.Unmarshal([]byte(processIdsStr), &processIds); err != nil { | ||
return map[string]int{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider making this a fatal error? If there is some garbage in the annotation, isn't it too risky to ignore its content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not - we should at least log this very clearly
assert.Equal(t, mrs.GetAnnotations()[util.LastAchievedRsMemberIds], string(rsMemberIdsBytes)) | ||
|
||
// We simulate a changing the project by recreating the omConnection. The resource will keep the annotation. | ||
// This part would have failed before 1.2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to have it written, but a good practice is to manually check if this actually is true by running this unit test against master and verifying it's indeed failing here
def new_project_configmap(namespace: str, project_name_prefix: str) -> str: | ||
cm = read_configmap(namespace=namespace, name="my-project") | ||
project_name = f"{project_name_prefix}-new-project" | ||
return create_or_update_configmap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, reconfiguring the project! We need to followup on this PR...
I'm leaving the comment there to adjust it in here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: can you check whether those new projects are properly cleaned up in cloudmanager?
@@ -97,7 +131,6 @@ def test_ops_manager_has_been_updated_correctly_before_scaling(): | |||
|
|||
@pytest.mark.e2e_multi_cluster_scale_up_cluster | |||
def test_scale_mongodb_multi(mongodb_multi: MongoDBMulti, member_cluster_clients: List[MultiClusterClient]): | |||
mongodb_multi.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to not need load() you must change mongodb_multi's fixture scope to "function" to load it every time it's injected into function
@@ -139,3 +159,59 @@ def test_ops_manager_has_been_updated_correctly_after_scaling(): | |||
def test_replica_set_is_reachable(mongodb_multi: MongoDBMulti, ca_path: str): | |||
tester = mongodb_multi.tester() | |||
tester.assert_connectivity(opts=[with_tls(use_tls=True, ca_path=ca_path)]) | |||
|
|||
|
|||
# From here on, the tests are for verifying that we can change the project of the MongoDBMulti resource even with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find this comment a bit unnecessary, but you don't need to remove it.
It would be better to wrap such a set of test functions into a class and name it properly, e.g. TestNonSequentialMemberIdsInReplicaSet
or sth like that.
assert oldRsMembers == newRsMembers | ||
|
||
|
||
def sts_are_ready(mdb: MongoDBMulti, member_cluster_clients: List[MultiClusterClient], members: List[int]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to me like something we must already have somewhere in our tests. Or is it doing something different than just waiting for sts to be ready? If yes, then pls name it so we can see in the name why it's different.
|
||
// Annotation keys used by the operator | ||
LastAchievedSpec = "mongodb.com/v1.lastSuccessfulConfiguration" | ||
LastAchievedRsMemberIds = "mongodb.com/v1.lastAchievedRsMemberIds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nammn to your concern about not putting the state into annotations - I agree, we should not do that. But mdbmc is still not migrated to the deployments state config map, and doing that should not be part of this PR. IMO it's fine to handle it this way, and migrate those annotations to the state cm once we decide to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@@ -141,6 +141,22 @@ func (oc *MockedOmConnection) ConfigureProject(project *Project) { | |||
oc.context.OrgID = project.OrgID | |||
} | |||
|
|||
func (oc *MockedOmConnection) GetReplicaSetMemberIds() (map[string]map[string]int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR: Given that we are just copy pasting the the whole functionality, we can simply wrap the omclient with this mock client and only override what is necessary.
@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte | |||
} | |||
|
|||
processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment) | |||
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation | |||
if len(processIds) == 0 { | |||
processIds = getReplicaSetProcessIdsFromAnnotation(mrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucian-tosa Can you please explain the "migration case" that requires use of annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome and simple code! Great tests and straight-forward. 2 blocking things and we are good to go
@@ -201,8 +201,14 @@ func (r *ReconcileMongoDbMultiReplicaSet) Reconcile(ctx context.Context, request | |||
return r.updateStatus(ctx, &mrs, status, log) | |||
} | |||
|
|||
// Save replicasets member ids in annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can remove this comment, we can find its usage via the ide
if err != nil { | ||
return err | ||
} | ||
if string(rsMemberIdsBytes) != "null" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: oh - yeah definitely please don't check for a null string. I think in other code places we have a better way of doing that
@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte | |||
} | |||
|
|||
processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment) | |||
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation | |||
if len(processIds) == 0 { | |||
processIds = getReplicaSetProcessIdsFromAnnotation(mrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: let's add a comment here on why this can happen (the migration case) for clarity
processIds := make(map[string]map[string]int) | ||
if processIdsStr, ok := mrs.Annotations[util.LastAchievedRsMemberIds]; ok { | ||
if err := json.Unmarshal([]byte(processIdsStr), &processIds); err != nil { | ||
return map[string]int{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not - we should at least log this very clearly
def new_project_configmap(namespace: str, project_name_prefix: str) -> str: | ||
cm = read_configmap(namespace=namespace, name="my-project") | ||
project_name = f"{project_name_prefix}-new-project" | ||
return create_or_update_configmap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: can you check whether those new projects are properly cleaned up in cloudmanager?
|
||
// Annotation keys used by the operator | ||
LastAchievedSpec = "mongodb.com/v1.lastSuccessfulConfiguration" | ||
LastAchievedRsMemberIds = "mongodb.com/v1.lastAchievedRsMemberIds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Summary
This PR fixes the behaviour of changing the project in a MongoDBMultiCluster resource. Multicluster resources have a different method of computing the member ids of each process in a replicaset (
replicasets
field in AC). Because of that, it is possible to get non-sequential ids by doing scaling operations. For example, we start from thisIf we now scale the first cluster up we end up with:
If we now change the project, the operator will find an empty AC and will rebuild it and the member ids in the replicaset. It will attempt to create them sequentially by setting id 2 to the member
0-2
. This is not possible, and the automation agents will reject this change, leaving the deployment stuck.In this PR we will now save the member ids we achieved in an annotation, and whenever there is an empty AC in OM, we try to read the annotation in case the deployment is being migrated to a new project.
Proof of Work
Added unit tests to check that annotations are created and updated successfully, as well as using the annotation when building the replicaset.
Also added an E2E test to ensure the resource will become ready.
Checklist
Reminder (Please remove this when merging)