Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-1386 Track client reset reason in table that detects client reset cycles #7649

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

michael-wb
Copy link
Contributor

What, How & Why?

This PR with the client reset error and action storage was broken out from PR #7542
These changes add the originating error Status and server requests action values from the SessionErrorInfo into the pending client reset tracking object that is used to provide client reset cycle detection and identify when a client reset is in progress so it can be closed out once the realm has been sync'ed with the server.
Fixes #6154

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • [ ] C-API, if public C++ API changed
  • [ ] bindgen/spec.yml, if public C++ API changed

@michael-wb michael-wb self-assigned this Apr 30, 2024
@cla-bot cla-bot bot added the cla: yes label Apr 30, 2024
@michael-wb michael-wb requested review from danieltabacaru, ironage and jbreams and removed request for ironage April 30, 2024 00:28
Copy link

coveralls-official bot commented Apr 30, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1118

Details

  • 579 of 612 (94.61%) changed or added relevant lines in 19 files are covered.
  • 89 unchanged lines in 18 files lost coverage.
  • Overall coverage decreased (-0.01%) to 90.81%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_reset.cpp 19 20 95.0%
src/realm/sync/noinst/sync_metadata_schema.cpp 33 34 97.06%
test/object-store/util/sync/sync_test_utils.cpp 5 6 83.33%
src/realm/sync/noinst/migration_store.cpp 12 14 85.71%
src/realm/sync/client.cpp 16 19 84.21%
src/realm/db.hpp 0 12 0.0%
src/realm/sync/noinst/pending_reset_store.cpp 189 202 93.56%
Files with Coverage Reduction New Missed Lines %
src/realm/array_mixed.cpp 1 91.91%
src/realm/sync/noinst/client_reset.cpp 1 94.38%
src/realm/sync/noinst/migration_store.cpp 1 94.94%
src/realm/util/serializer.cpp 1 90.43%
test/fuzz_tester.hpp 1 57.73%
src/realm/sync/client.cpp 2 90.17%
src/realm/sync/noinst/protocol_codec.hpp 3 74.03%
src/realm/table.cpp 3 90.14%
test/test_thread.cpp 3 66.14%
src/realm/array_string.cpp 4 87.23%
Totals Coverage Status
Change from base Build 2339: -0.01%
Covered Lines: 214898
Relevant Lines: 236645

💛 - Coveralls

test/test_client_reset.cpp Outdated Show resolved Hide resolved
src/realm/sync/noinst/client_reset.cpp Outdated Show resolved Hide resolved
src/realm/sync/noinst/client_reset.cpp Outdated Show resolved Hide resolved
src/realm/sync/noinst/client_reset.cpp Outdated Show resolved Hide resolved
@michael-wb michael-wb requested a review from ironage May 4, 2024 03:15
src/realm/object-store/sync/sync_session.hpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_session.hpp Outdated Show resolved Hide resolved
logger.debug(
"Was going to remove client reset tracker for type \"%1\" from %2, but it was already removed",
pending_reset.type, pending_reset.time);
logger.debug(util::LogCategory::reset, "Client reset cycle detection tracker already removed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we lose the extra info getting logged here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the pending reset details are already being printed above (on line 2001)

@@ -62,11 +62,12 @@ class SyncSocketProvider;
struct ClientReset {
realm::ClientResyncMode mode;
DBRef fresh_copy;
bool recovery_is_allowed = true;
bool recovery_is_allowed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just capture the original SessionErrorInfo that caused the client reset here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SessionErrorInfo can't be default created and there are only a few parts needed, I don't think we need to. I did remove the recovery_is_allowed since it is no longer needed.

src/realm/sync/noinst/client_reset.cpp Outdated Show resolved Hide resolved
auto mode = config(&SyncConfig::client_resync_mode);
if (mode == ClientResyncMode::Recover) {
handle_fresh_realm_downloaded(
nullptr,
{ErrorCodes::RuntimeError,
"A client reset is required but the server does not permit recovery for this client"},
server_requests_action);
error_info);
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just capture error_info in a member variable after this line rather than passing it as an argument to handle_fresh_realm_downloaded below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now being saved to a member variable in download_fresh_realm() and no longer being passed to handle_fresh_realm_downloaded()

src/realm/sync/noinst/client_reset.cpp Outdated Show resolved Hide resolved
TableRef table = get_or_create_client_reset_table(wt);
REALM_ASSERT(table);
// Even if the table is being updated to V2, an existing entry will still throw an exception
size_t table_size = table->size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 583/584 can just be if(table->size() > 0). it doesn't look like we use the table_size variable anywhere else.

src/realm/sync/client.cpp Outdated Show resolved Hide resolved
else if (type == 1) {
pending.type = ClientResyncMode::Recover;
// Version 2 columns
if (pending.version >= 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the sync metadata schema machinery here to create all these column/look up these colkeys for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrated the pending reset function to use the sync metadata schema and made it a "store" object that is owned by SyncSession.

@@ -3,6 +3,7 @@
### Enhancements
* <New feature description> (PR [#????](https://github.com/realm/realm-core/pull/????))
* Add vendor support to the Android Blueprint (PR [#7614](https://github.com/realm/realm-core/pull/7614)).
* Add the originating error and server requests action that caused a client reset to occur to the client reset tracking metadata storage. ([PR #7649](https://github.com/realm/realm-core/pull/7649))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client reset tracking metadata storage is internal, so it should not be mentioned. I was under the impression that the reason we store the error is so it's not lost and we can surface it to the user if a client reset gets into a cycle. That's what I'd've mentioned here.

@@ -62,11 +62,11 @@ class SyncSocketProvider;
struct ClientReset {
realm::ClientResyncMode mode;
DBRef fresh_copy;
bool recovery_is_allowed = true;
sync::ProtocolErrorInfo::Action action = sync::ProtocolErrorInfo::Action::ClientReset;
std::optional<Status> error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there always an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is - I had made it optional since Status doesn't have a default constructor and ClientReset is default constructed and then populated throughout the code.

I could add an appropriate constructor and remove the optional if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just store the reason? I think that's all we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be helpful to have the error code as well and it shouldn't be that expensive to store...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. it does feel a bit wrong to have the error optional though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the ClientReset structure so error is no longer an optional and updated the usages to use the initializers.

src/realm/sync/noinst/client_reset_operation.cpp Outdated Show resolved Hide resolved
bool load_schema(const TransactionRef& rd_tr);
void create_schema(const TransactionRef& rd_tr);

std::optional<PendingReset> read_legacy_pending_reset(const TransactionRef& rd_tr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having this, isn't it easier to migrate the v1 data to v2 and use it as if the metadata was always at v2? We do something similar with the SubscriptionStore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose a more "lazy" approach to writing to the pending reset store and not write to the realm file when the session is started, but wait until the pending reset info is written or cleared.
In the original code, there is an instance where the original pending reset code was provided a frozen transaction, so I defaulted the reads to use frozen transactions as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It's just that we could do the same with less code I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I streamlined the legacy pending reset code a bit - it had extra logic for future schema updates, but took that out.

REALM_ASSERT(reset_store);
DB& db_remote = *reset_config.fresh_copy;
auto actual_mode =
reset_precheck_guard(reset_store, reset_config.mode, reset_config.action, reset_config.error, logger);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should reset the guard part of the same write transaction (as before), so everything rolls back if it fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right - I see what you mean. I'll update it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns there there is one commit transaction when track_reset() is called, so even if it rolls back, the pending reset info is still stored. I updated the PendingResetStore to be a set of static functions (again) with the schema metadata info used under the hood. This allows better control over the transactions and when they are committed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, I just followed the code path and that's the case. I think there is still a difference because the write lock was not released before.

src/realm/sync/client.cpp Outdated Show resolved Hide resolved
src/realm/sync/noinst/pending_reset_store.cpp Show resolved Hide resolved
m_sess->logger.info("Tracking pending client reset of type \"%1\" from %2", pending_reset->type,
pending_reset->time);
std::optional<PendingReset> pending_reset;
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it doesn't seem to need its own scope

PendingResetStore* SessionWrapper::get_pending_reset_store()
{
REALM_ASSERT(!m_finalized);
return m_pending_reset_store.get();
Copy link
Contributor

@ironage ironage May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use get_pending_reset_store() in several places without any nullptr checks, could you add an assertion here, or alternatively make this return a PendingResetStore& to communicate that it should never be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually moved the PendingResetStore back to a set of static functions instead of an initialized object, since it's only used in a few places and to have control over the transactions and when they are committed.

std::ostream& operator<<(std::ostream& os, const sync::PendingReset& pr);
bool operator==(const sync::PendingReset& lhs, const sync::PendingReset& rhs);

class PendingResetStore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring! The SyncMetadataTable helpers didn't exist yet when this metadata was first added.

Copy link
Contributor

@ironage ironage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good from my perspective, I looked at the logs from some of the test output, and it seems fine to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track client reset reason in table that detects client reset cycles
4 participants