Skip to content

CLOUDP-324440 Save rs member ids #206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

lucian-tosa
Copy link
Contributor

@lucian-tosa lucian-tosa commented Jun 20, 2025

Summary

This PR fixes the behaviour of changing the project in a MongoDBMultiCluster resource. Multicluster resources have a different method of computing the member ids of each process in a replicaset (replicasets field in AC). Because of that, it is possible to get non-sequential ids by doing scaling operations. For example, we start from this

  • mongodb-0-0: id 0
  • mongodb-0-1: id 1
  • mongodb-1-0: id 2

If we now scale the first cluster up we end up with:

  • mongodb-0-0: id 0
  • mongodb-0-1: id 1
  • mongodb-1-0: id 2
  • mongodb-0-2: id 3

If we now change the project, the operator will find an empty AC and will rebuild it and the member ids in the replicaset. It will attempt to create them sequentially by setting id 2 to the member 0-2. This is not possible, and the automation agents will reject this change, leaving the deployment stuck.

In this PR we will now save the member ids we achieved in an annotation, and whenever there is an empty AC in OM, we try to read the annotation in case the deployment is being migrated to a new project.

Proof of Work

Added unit tests to check that annotations are created and updated successfully, as well as using the annotation when building the replicaset.
Also added an E2E test to ensure the resource will become ready.

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you checked for release_note changes?

Reminder (Please remove this when merging)

  • Please try to Approve or Reject Changes the PR, keep PRs in review as short as possible
  • Our Short Guide for PRs: Link
  • Remember the following Communication Standards - use comment prefixes for clarity:
    • blocking: Must be addressed before approval.
    • follow-up: Can be addressed in a later PR or ticket.
    • q: Clarifying question.
    • nit: Non-blocking suggestions.
    • note: Side-note, non-actionable. Example: Praise
    • --> no prefix is considered a question

@lucian-tosa lucian-tosa force-pushed the CLOUDP-324440-fix-rs-ids branch from add6a32 to 8ae690c Compare June 30, 2025 15:48
@lucian-tosa lucian-tosa marked this pull request as ready for review July 1, 2025 07:07
@lucian-tosa lucian-tosa requested a review from a team as a code owner July 1, 2025 07:07
@lucian-tosa lucian-tosa requested review from nammn and removed request for Julien-Ben July 1, 2025 08:39
@lucian-tosa lucian-tosa force-pushed the CLOUDP-324440-fix-rs-ids branch from f76fc3a to 1da9a93 Compare July 1, 2025 10:57
@lucian-tosa lucian-tosa requested a review from vinilage July 1, 2025 11:14
@nammn nammn requested a review from Copilot July 1, 2025 11:19
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue with migrating a MongoDBMultiCluster resource to a new project by saving and restoring replica set member IDs in an annotation. It also updates test cases and refactors various functions to use the new GetReplicaSets naming convention.

  • Added a new constant for the replica set member IDs annotation.
  • Updated reconciliation logic to save and use member IDs during project changes.
  • Refactored tests and Om client methods for consistent API usage.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/util/constants.go Introduces a new constant for saving RS member IDs.
docker/mongodb-kubernetes-tests/tests/multicluster/multi_cluster_scale_up_cluster.py Updates test scenarios to account for scaled member counts and project changes.
controllers/operator/mongodbmultireplicaset_controller.go Adds logic to retrieve and save RS member IDs from OM and updates the signature for saving the last achieved spec.
controllers/operator/mongodbmultireplicaset_controller_test.go Adds new test cases to verify annotation persistence and proper member ID usage.
controllers/om/* Adds new methods (e.g., MemberIds and GetReplicaSetMemberIds) and updates references to use the new GetReplicaSets API.
RELEASE_NOTES.md Documents the bug fix related to project migration for MongoDBMultiCluster resources.
Comments suppressed due to low confidence (1)

controllers/operator/mongodbmultireplicaset_controller.go:815

  • [nitpick] The inner variable 'processIds' is a nested map; consider renaming it (for example to 'allMemberIds') to more clearly reflect its structure and improve code readability.
func getReplicaSetProcessIdsFromAnnotation(mrs mdbmultiv1.MongoDBMultiCluster) map[string]int {

if err != nil {
return err
}
if string(rsMemberIdsBytes) != "null" {
Copy link
Preview

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider checking if rsMemberIds is nil or if the marshalled byte slice is empty instead of comparing to the string "null", as this would improve clarity and robustness if the serialization behavior changes.

Suggested change
if string(rsMemberIdsBytes) != "null" {
if rsMemberIds != nil && len(rsMemberIdsBytes) > 0 {

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: oh - yeah definitely please don't check for a null string. I think in other code places we have a better way of doing that

@@ -25,6 +25,9 @@
* [Manage Database Users using OIDC](https://www.mongodb.com/docs/kubernetes/upcoming/manage-users/) # TODO
* [Authentication and Authorization with OIDC/OAuth 2.0](https://www.mongodb.com/docs/manual/core/oidc/security-oidc/)

## Bug Fixes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please highlight better the introduction of OIDC? Checking the release notes, I see the new features but our main goal for this release is the OIDC, so I would suggest to add a section for OIDC. I will create a suggestion in GDocs and share with you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep this PR focused on the bug fix. I think we can do this refactoring when we release. Wdyt?

@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte
}

processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment)
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation
if len(processIds) == 0 {
processIds = getReplicaSetProcessIdsFromAnnotation(mrs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use conn.GetReplicaSetMemberIds() here instead? Couldn't that save us the whole annotation work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline - its the migration case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucian-tosa Can you please explain the "migration case" that requires use of annotation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: let's add a comment here on why this can happen (the migration case) for clarity

Copy link
Contributor

@lsierant lsierant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally - awesome fix! 👏
But I've left one blocking comment regarding not adding another method to the OM's connection interface. Consider the rest optional.

@@ -273,6 +277,21 @@ func (oc *HTTPOmConnection) GetAgentAuthMode() (string, error) {
return ac.Auth.AutoAuthMechanism, nil
}

func (oc *HTTPOmConnection) GetReplicaSetMemberIds() (map[string]map[string]int, error) {
Copy link
Contributor

@lsierant lsierant Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: Do we need that separate method? Could we have an external function calling ReadDeployment and extracting the ids from the result?

I'd prefer to limit the interface of OMConnection to the distinct API operations and have them combined for the logic like this in a different layer above it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this we'd also avoic changing the mock's implementation

if err != nil {
return err
}
if string(rsMemberIdsBytes) != "null" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion.

processIds := make(map[string]map[string]int)
if processIdsStr, ok := mrs.Annotations[util.LastAchievedRsMemberIds]; ok {
if err := json.Unmarshal([]byte(processIdsStr), &processIds); err != nil {
return map[string]int{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making this a fatal error? If there is some garbage in the annotation, isn't it too risky to ignore its content?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not - we should at least log this very clearly

assert.Equal(t, mrs.GetAnnotations()[util.LastAchievedRsMemberIds], string(rsMemberIdsBytes))

// We simulate a changing the project by recreating the omConnection. The resource will keep the annotation.
// This part would have failed before 1.2.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to have it written, but a good practice is to manually check if this actually is true by running this unit test against master and verifying it's indeed failing here

def new_project_configmap(namespace: str, project_name_prefix: str) -> str:
cm = read_configmap(namespace=namespace, name="my-project")
project_name = f"{project_name_prefix}-new-project"
return create_or_update_configmap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, reconfiguring the project! We need to followup on this PR...
I'm leaving the comment there to adjust it in here as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: can you check whether those new projects are properly cleaned up in cloudmanager?

@@ -97,7 +131,6 @@ def test_ops_manager_has_been_updated_correctly_before_scaling():

@pytest.mark.e2e_multi_cluster_scale_up_cluster
def test_scale_mongodb_multi(mongodb_multi: MongoDBMulti, member_cluster_clients: List[MultiClusterClient]):
mongodb_multi.load()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to not need load() you must change mongodb_multi's fixture scope to "function" to load it every time it's injected into function

@@ -139,3 +159,59 @@ def test_ops_manager_has_been_updated_correctly_after_scaling():
def test_replica_set_is_reachable(mongodb_multi: MongoDBMulti, ca_path: str):
tester = mongodb_multi.tester()
tester.assert_connectivity(opts=[with_tls(use_tls=True, ca_path=ca_path)])


# From here on, the tests are for verifying that we can change the project of the MongoDBMulti resource even with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find this comment a bit unnecessary, but you don't need to remove it.
It would be better to wrap such a set of test functions into a class and name it properly, e.g. TestNonSequentialMemberIdsInReplicaSet or sth like that.

assert oldRsMembers == newRsMembers


def sts_are_ready(mdb: MongoDBMulti, member_cluster_clients: List[MultiClusterClient], members: List[int]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to me like something we must already have somewhere in our tests. Or is it doing something different than just waiting for sts to be ready? If yes, then pls name it so we can see in the name why it's different.


// Annotation keys used by the operator
LastAchievedSpec = "mongodb.com/v1.lastSuccessfulConfiguration"
LastAchievedRsMemberIds = "mongodb.com/v1.lastAchievedRsMemberIds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nammn to your concern about not putting the state into annotations - I agree, we should not do that. But mdbmc is still not migrated to the deployments state config map, and doing that should not be part of this PR. IMO it's fine to handle it this way, and migrate those annotations to the state cm once we decide to do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@@ -141,6 +141,22 @@ func (oc *MockedOmConnection) ConfigureProject(project *Project) {
oc.context.OrgID = project.OrgID
}

func (oc *MockedOmConnection) GetReplicaSetMemberIds() (map[string]map[string]int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR: Given that we are just copy pasting the the whole functionality, we can simply wrap the omclient with this mock client and only override what is necessary.

@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte
}

processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment)
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation
if len(processIds) == 0 {
processIds = getReplicaSetProcessIdsFromAnnotation(mrs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucian-tosa Can you please explain the "migration case" that requires use of annotation?

Copy link
Collaborator

@nammn nammn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome and simple code! Great tests and straight-forward. 2 blocking things and we are good to go

@@ -201,8 +201,14 @@ func (r *ReconcileMongoDbMultiReplicaSet) Reconcile(ctx context.Context, request
return r.updateStatus(ctx, &mrs, status, log)
}

// Save replicasets member ids in annotation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove this comment, we can find its usage via the ide

if err != nil {
return err
}
if string(rsMemberIdsBytes) != "null" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: oh - yeah definitely please don't check for a null string. I think in other code places we have a better way of doing that

@@ -699,6 +713,10 @@ func (r *ReconcileMongoDbMultiReplicaSet) updateOmDeploymentRs(ctx context.Conte
}

processIds := getReplicaSetProcessIdsFromReplicaSets(mrs.Name, existingDeployment)
// If there is no replicaset configuration saved in OM, it might be a new project, so we check the ids saved in annotation
if len(processIds) == 0 {
processIds = getReplicaSetProcessIdsFromAnnotation(mrs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: let's add a comment here on why this can happen (the migration case) for clarity

processIds := make(map[string]map[string]int)
if processIdsStr, ok := mrs.Annotations[util.LastAchievedRsMemberIds]; ok {
if err := json.Unmarshal([]byte(processIdsStr), &processIds); err != nil {
return map[string]int{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not - we should at least log this very clearly

def new_project_configmap(namespace: str, project_name_prefix: str) -> str:
cm = read_configmap(namespace=namespace, name="my-project")
project_name = f"{project_name_prefix}-new-project"
return create_or_update_configmap(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: can you check whether those new projects are properly cleaned up in cloudmanager?


// Annotation keys used by the operator
LastAchievedSpec = "mongodb.com/v1.lastSuccessfulConfiguration"
LastAchievedRsMemberIds = "mongodb.com/v1.lastAchievedRsMemberIds"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants