Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2063 Fix flakey 'Test client migration and rollback with recovery' test #7542

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

michael-wb
Copy link
Contributor

@michael-wb michael-wb commented Apr 3, 2024

What, How & Why?

Updated the client reset tracking to store the original client reset error and action. If a new client reset occurs with a different action (e.g. rolled back to PBS), the original client reset tracking info (e.g. for migrated to FLX) will be removed and the new client reset will be allowed to continue.

Added extra sync client hook events to capture different steps along the way during a client reset - this allows the "Test client migration and rollback with recovery" test to pause the client reset and roll back to PBS while the FLX migration client reset is in progress, effective reproducing the condition that was intermittently failing in the past. This test was previously taking 90+ secs to complete and about a third of this time was waiting for the reconnect timer to expire for the sync session that was active during migration and rollback. Added handle_reconnect() call after migration/rollback to cancel this timer, shaving around 30 secs off this test.

Fixes #7539, #6154

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • [ ] C-API, if public C++ API changed
  • [ ] bindgen/spec.yml, if public C++ API changed

@michael-wb michael-wb self-assigned this Apr 3, 2024
@cla-bot cla-bot bot added the cla: yes label Apr 3, 2024
Copy link

coveralls-official bot commented Apr 3, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1058

Details

  • 477 of 496 (96.17%) changed or added relevant lines in 10 files are covered.
  • 39 unchanged lines in 14 files lost coverage.
  • Overall coverage increased (+0.03%) to 90.782%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_impl_base.cpp 11 12 91.67%
test/object-store/util/sync/sync_test_utils.cpp 9 10 90.0%
src/realm/sync/client.cpp 14 20 70.0%
src/realm/sync/noinst/client_reset.cpp 173 184 94.02%
Files with Coverage Reduction New Missed Lines %
src/realm/query_engine.hpp 1 94.08%
src/realm/sync/client.cpp 1 90.07%
src/realm/sync/noinst/client_reset.cpp 1 94.41%
src/realm/sync/noinst/server/server_history.cpp 1 63.7%
src/realm/query_expression.cpp 2 86.62%
src/realm/sync/noinst/server/server.cpp 2 74.22%
src/realm/sync/noinst/client_impl_base.cpp 3 82.76%
src/realm/sync/noinst/protocol_codec.hpp 3 73.5%
src/realm/sync/transform.cpp 3 60.92%
test/test_thread.cpp 3 64.84%
Totals Coverage Status
Change from base Build 2262: 0.03%
Covered Lines: 212597
Relevant Lines: 234185

💛 - Coveralls

@danieltabacaru
Copy link
Collaborator

Don't we want to fix the real issue too (clearing tracking the client reset in case of a rollback)?

@michael-wb
Copy link
Contributor Author

oh, yes - the test was originally rolling back during client reset on purpose to see what the issues are and the client reset tracking entry was being left over, causing the original failure.
I'll fix this, too.

@michael-wb
Copy link
Contributor Author

michael-wb commented Apr 19, 2024

Added client reset error and action tracking to the client reset metadata storage (producing v2) and if the new client reset action is different from the current client reset action, then the older client reset tracking will be removed in favor of the new client reset. Also updated the "Test client migration and rollback with recovery" test to reflect this operation:

Realm.Sync.Client.Reset - Connection[2]: Session[12]: Found a previous recoverable client reset of type: 'Recover' for 'MigrateToFLX' at: 2024-04-19 14:04:15.848902000
Realm.Sync.Client.Reset - Connection[2]: Session[12]: Originating client reset error: WrongSyncType: Client connected using partition-based sync when app has been migrated to flexible sync
Realm.Sync.Client.Reset - Connection[2]: Session[12]: New recoverable client reset of type: 'Recover' for 'RevertToPBS' is incompatible - clearing previous reset
Realm.Sync.Client.Reset - Connection[2]: Session[12]: New client reset error: WrongSyncType: Client connected using flexible sync when app has been reverted back to partition-based sync

@michael-wb michael-wb marked this pull request as ready for review April 19, 2024 15:47
@jbreams
Copy link
Contributor

jbreams commented Apr 29, 2024

Can we break this PR up into the three separate PRs? 1) adding the extra info to the client reset tracker to show what the original error was 2) changing it so that if a new client reset of a different type starts it will be allowed to continue 3) any changes to fix up the test ?

@michael-wb
Copy link
Contributor Author

michael-wb commented Apr 29, 2024

Can we break this PR up into the three separate PRs? 1) adding the extra info to the client reset tracker to show what the original error was 2) changing it so that if a new client reset of a different type starts it will be allowed to continue 3) any changes to fix up the test ?

Sure - I can do that, although it may be better to create two PRs - one for the additions and another that allows a different type to continue + updates to migration test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants