Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-1900 Make "next launch" metadata actions multiprocess-safe #7576

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented Apr 10, 2024

File actions are performed on "next launch" since that's a time where we know it's safe to delete or move Realm files which may have been in use. This previously meant when the metadata store was next constructed, which was incorrect when multiple processes are sharing a metadata file. Instead, it tries to claim the sync agent flag on the metadata Realm file, and only performs the launch actions if it's able to.

There is still a race condition here: process 1 could initialize the metadata realm, claim the flag, then get suspended. Process 2 initializes, opens a Realm, and then removes the user. Process 1 then unsuspends and proceeds to run file actions, deleting the Realm process 2 has open. This is probably not actually a problem but I'll continue to try to find a fix.

Keeping the metadata Realm open is required for this to work, for change notification to work (a future change), and greatly improves performance of metadata operations. It slightly increases memory usage but the metadata realm is typically very small.

update_user() could previously overwrite data with stale data previously read. It was very difficult for this to actually happen with where and when we called it, but better to fix the problem.


There's two groups of not directly related updates to the tests. There were some spots where we called assert_no_open_realms() where the sync metadata Realm now should be open. These now check for the specific file that we're expecting to be closed instead. To help track down some scheduler-related problems, I added an assertion to verify that the libuv scheduler is only created on the main thread, as it doesn't support background threads. This turned out to reveal a bunch of places where we were creating them on background threads, which I then fixed.

@tgoyne tgoyne self-assigned this Apr 10, 2024
@cla-bot cla-bot bot added the cla: yes label Apr 10, 2024
@tgoyne tgoyne force-pushed the tg/multi-process-launch-actions branch 3 times, most recently from fa29e67 to 350ca2c Compare April 15, 2024 04:38
Copy link

coveralls-official bot commented Apr 15, 2024

Pull Request Test Coverage Report for Build thomas.goyne_306

Details

  • 387 of 387 (100.0%) changed or added relevant lines in 15 files are covered.
  • 27 unchanged lines in 8 files lost coverage.
  • Overall coverage increased (+0.2%) to 90.929%

Files with Coverage Reduction New Missed Lines %
src/realm/sync/noinst/client_impl_base.hpp 1 93.16%
src/realm/util/serializer.cpp 1 90.43%
test/test_query2.cpp 1 98.73%
src/realm/db.cpp 2 91.92%
src/realm/sync/noinst/protocol_codec.hpp 3 74.03%
src/realm/sync/transform.cpp 3 60.92%
src/realm/sync/noinst/client_impl_base.cpp 7 82.84%
src/realm/sync/noinst/server/server.cpp 9 73.19%
Totals Coverage Status
Change from base Build 2263: 0.2%
Covered Lines: 214766
Relevant Lines: 236192

💛 - Coveralls

@tgoyne tgoyne force-pushed the tg/multi-process-launch-actions branch 9 times, most recently from b6589ac to c28f550 Compare April 17, 2024 22:19
@tgoyne tgoyne changed the base branch from master to tg/promise-race April 17, 2024 22:34
Base automatically changed from tg/promise-race to master April 17, 2024 23:19
@tgoyne tgoyne force-pushed the tg/multi-process-launch-actions branch from c28f550 to 3916ad1 Compare April 19, 2024 21:15
Copy link
Contributor

@michael-wb michael-wb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, but posting a couple of questions/comments I have so far.

test/object-store/sync/metadata.cpp Outdated Show resolved Hide resolved
Comment on lines +677 to +684
create_metadata_store(config, file_manager);
REQUIRE(File::exists(path));

store_1.reset();

create_metadata_store(config, file_manager);
REQUIRE(File::exists(path));

store_2.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify: in this case, store_1 and/or store_2 are still open, so reopening the metadata store doesn't run the file actions. It's not until both are closed that the file actions are performed the next time the metadata store is opened.
From the PR description, it sounds like this is the current behavior, but may be changing in the future?

data->access_token.token.clear();
data->refresh_token.token.clear();
store->update_user("user 3", *data);
store->update_user("user 3", [](auto& data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also check that the current user is "user 3" prior to making these changes?

m_metadata_store->update_user(user->user_id(), [&](auto& data) {
data.identities = std::move(identities);
data.profile = UserProfile(get<BsonDocument>(profile_json, "data"));
user->update_backing_data(data); // FIXME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between updating the user in the metadata store and updating the user's backing data?
Is one a cached copy vs persistent/saved to disk? Or is this the trigger for the client app to update their own user metadata storage? Or is there some other purpose.
I see the FIXME comment, so perhaps this will be updated in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata store is the source of truth for the user data, and User stores an in-memory copy of it. The intention is to move to a one-way data flow where the metadata store pushes updates to the active users, but for now everything that updates the store also manually updates the users too.

m_metadata_store->update_user(user->user_id(), *data);
user->update_backing_data(std::move(data));
}
m_metadata_store->update_user(user->user_id(), [&](auto& data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function a no-op if the user isn't found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Another process could delete the user concurrently with us updating the user, so it's not an error for the user to be missing.

@tgoyne
Copy link
Member Author

tgoyne commented Apr 22, 2024

I've added another commit which fixes the race condition mentioned in the PR description and added more of a description of what's going on.

@tgoyne tgoyne force-pushed the tg/multi-process-launch-actions branch 2 times, most recently from adaacaf to 1b39976 Compare April 22, 2024 17:29
@tgoyne tgoyne force-pushed the tg/multi-process-launch-actions branch from 1b39976 to f696863 Compare April 26, 2024 18:26
@danieltabacaru
Copy link
Collaborator

Instead, it tries to claim the sync agent flag on the metadata Realm file, and only performs the launch actions if it's able to.

I guess there needs to be a guarantee that the sync agent flag on the realm file is not claimed right? Is it that it's always a single process that claims both? (and so if the metadata one is claimed then it must mean that the realm file is not in use?)

@tgoyne
Copy link
Member Author

tgoyne commented May 22, 2024

There's a single metadata Realm per app, and claiming the sync agent on it has no connection to claiming the sync agent on any of the Realms which the user opens. One process claiming the sync agent on the metadata realm and performing launch actions while another process is opening Realms and being the sync agent on those is completely fine.

@danieltabacaru
Copy link
Collaborator

One process claiming the sync agent on the metadata realm and performing launch actions while another process is opening Realms and being the sync agent on those is completely fine.

I thought that deleting a realm file while another process uses it would be an issue.

@tgoyne
Copy link
Member Author

tgoyne commented May 22, 2024

The point of all this is to avoid deleting a Realm file in use. The metadata Realm is always open at any point where a sync Realm is open, and once the sync agent on the metadata Realm is claimed it remains claimed until everyone has closed the metadata Realm. This means that if we are able to claim the sync agent on the metadata Realm, we know that at that precise moment in time there were no open Realms associated with the current app in any process. It is invalid to reopen a Realm after creating a file action for that path, so once we have a point in time where we know that there are no open Realms, the Realms associated with all file actions created before that point cannot be open. While this is happening another process may be opening other Realms associated with the same app, but they cannot be the same ones as we're deleting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants