Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2099 Restore progress notifier behavior when sync session is already caught up #7681

Merged
merged 14 commits into from May 20, 2024

Conversation

jbreams
Copy link
Contributor

@jbreams jbreams commented May 7, 2024

What, How & Why?

This is split up into a few commits.

  1. Contains the functional changes - basically reverts the object-store behavior of progress notifications back to before the latest changes and reverts the tests back to before the latest changes. Not sure the best way to review this in github, but the tests should look like they did in 255cb33, only with the query_version threaded through for FLX and checks progress estimates for PBS are what we expect them to be based on the old behavior.
  2. Adds a bunch of unit-tests for FLX sync where we test the new progress estimates for FLX sync under various circumstances.
  3. Adds some integration tests for PBS/FLX where we make sure the progress notifications API works as expected when talking to baas and via AsyncOpen.
  4. fixes a compile failure i didn't catch while constructing this PR initially. whoops.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed
  • bindgen/spec.yml, if public C++ API changed

@cla-bot cla-bot bot added the cla: yes label May 7, 2024
@jbreams jbreams changed the title Restore progress notifier behavior when sync session is already caught up RCORE-2099 Restore progress notifier behavior when sync session is already caught up May 7, 2024
Copy link

coveralls-official bot commented May 7, 2024

Pull Request Test Coverage Report for Build jonathan.reams_3219

Details

  • 742 of 768 (96.61%) changed or added relevant lines in 4 files are covered.
  • 78 unchanged lines in 18 files lost coverage.
  • Overall coverage decreased (-0.009%) to 90.806%

Changes Missing Coverage Covered Lines Changed/Added Lines %
test/object-store/sync/session/progress_notifications.cpp 692 718 96.38%
Files with Coverage Reduction New Missed Lines %
src/realm/query_engine.hpp 1 94.11%
src/realm/sort_descriptor.cpp 1 94.06%
src/realm/sync/noinst/server/server_history.cpp 1 63.7%
src/realm/util/file.cpp 1 78.73%
test/fuzz_tester.hpp 1 57.73%
test/test_index_string.cpp 1 93.48%
src/realm/mixed.cpp 2 86.46%
src/realm/sync/network/http.hpp 2 82.27%
test/object-store/sync/session/progress_notifications.cpp 2 97.13%
src/realm/object-store/sync/sync_session.cpp 3 91.67%
Totals Coverage Status
Change from base Build 2318: -0.009%
Covered Lines: 214758
Relevant Lines: 236501

💛 - Coveralls

@jbreams jbreams marked this pull request as ready for review May 7, 2024 19:24
Copy link
Member

@tgoyne tgoyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable at first glance, but I think I should take a proper pass over the tests and make sure that the expected behavior seems correct.

@@ -1267,7 +1267,12 @@ void SyncSession::wait_for_download_completion(util::UniqueFunction<void(Status)
uint64_t SyncSession::register_progress_notifier(std::function<ProgressNotifierCallback>&& notifier,
ProgressDirection direction, bool is_streaming)
{
return m_progress_notifier.register_callback(std::move(notifier), direction, is_streaming);
int64_t pending_query_version = 0;
assert_mutex_unlocked();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_progress_notifier() needs to be annotated as REQUIRES(!m_state_mutex) rather than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


started_notifying = true;
// If this is a non-streaming download progress update and this notifier was
// created for a later query version (i.e. we're currently downloading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., not i.e.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +1610 to +1612
// The initial download size we get from the server is the uncompacted
// size, and so the download may complete before we actually receive
// that much data. When that happens, transferrable will drop and we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mention of compaction here is out of date. transferable can still sometimes drop with BaaS, but it no longer due to compaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i dunno. it can also drop due to compaction with pbs. tbh, i don't know the full set of circumstances when this can happen, but i wanted to preserve the behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this actually also used for upload as a way to save the uploadable bytes at the time of registration (or when update is called first)? The change below also suggests this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, yeah, but why we do that exactly is not clear anymore.

CHECK(!callback_was_called);
}

// The functionality of this test was moved out of the object-store and into the sync client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means that we've removed the fake "server_version" parameter to the object-store progress callback and now the sync::Session itself ignores progress updates before it receives its first download message

. we can probably remove this test and/or replace it with an integration test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've verified that we updated the sync::Session upload/download progress tests to so we don't get any progress updates until we've received at least one download message, and removed this test in a later commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not getting any upload progress updates until we've received at least one download message is itself a breaking change that should be reverted. I think all of the changes to test_sync.cpp in #7124 that aren't just adapting to the changed signature of the progress notification callback are bogus and are resulting in some tests (such as Sync_UploadDownloadProgress_3) not testing the scenario they were specifically written to test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe that's actually wrong and we never were sending upload notifications before the first download. The sync progress notification tests are definitely nonsense now, but it's sort of hard to tell if there's also functional changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is something I went back and forth in the PR about during the initial implementation. I think there is a functional change in the sync::Session behavior, but it is just behavior that used to be in object-store. You used to get an initial upload notification when you first registered your callback, but we ignored progress updates from the sync::Session until after you'd received the first download.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look at Sync_UploadDownloadProgress_3 and any other tests to try to make sure they're doing what they say they do, but probably not before this PR is merged.

Copy link
Collaborator

@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the functional changes. Tests are next (and last stop).


started_notifying = true;
// If this is a non-streaming download progress update and this notifier was
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this check prevents reporting any progress until the latest query at the time the user registered the notifier starts bootstrapping right? I guess this is nice because there is no 0->1->0->....->1 anymore, but it may take a long time until the user gets any progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for the most common case this means we skip query version zero and you get progress notifications for version 1. But yeah, in the worst case if you had a whole bunch of outstanding subscription sets you could end up having to wait for them all to sync.

src/realm/object-store/sync/sync_session.cpp Show resolved Hide resolved

// A notifier is expired if at least as many bytes have been transferred
// as were originally considered transferrable.
is_expired =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, this also fires when there is a fresh realm with no subscriptions and the schema is downloaded (as opposed to previous implementation) right? If that's the case it could be worth mentioning it in the description. Edit: the check at line 1605 actually prevents this.

callback_was_called = false;
progress.update_upload(2, 2, 1);
CHECK_FALSE(callback_was_called);
progress.register_callback(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a check after registration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bool is_streaming = GENERATE(false, true);
progress.update(0, 0, 0, 0, 1);
SECTION("callback is invoked after each update for streaming notifiers") {
progress.update(0, 0, 0, 0, 1, 0, 0, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it helps readability when using double values (eg, 0.0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (!is_streaming)
register_default_upload_callback(is_streaming);

double current_estimate = current_transferred / double(current_transferrable);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could use an std::vectorstd::pair> and iterate through it updating the progress and do the checks. same for a few more tests below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. i think there's real improvements to be done in the readability/maintainability of these tests, but for now I want to get them back to a known good state before doing refactoring.

callback_was_called = true;
},
NotifierType::upload, false, 0);
// Wait for the initial callback.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't really wait for anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the comment - it was just copied over from the old tests.


// Second callback
callback_was_called = false;
current_transferred = 79;
current_transferrable = 1021;
progress.update_upload(current_transferred, current_transferrable, 1);
current_estimate = current_transferred / double(current_transferrable);
progress.update(68, 191, current_transferred, current_transferrable, 1, 68 / double(191),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could increase the snapshot version in these subsequent calls. something like local_version++ would do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests were copied from before the latest changes, and i'd prefer to keep them as close as possible to that in this PR and improve them when we're satisfied there aren't further regressions.

progress.update(current_downloaded, current_downloadable, current_uploaded, current_uploadable, 1);
current_estimate = current_downloaded / double(current_downloadable);
progress.update(current_downloaded, current_downloadable, current_uploaded, current_uploadable, 1,
current_estimate, 1.0, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can the upload estimate be 1.0? i think it should be 31/329

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// the total number of uploadable bytes available rather than the number of
// bytes this NotifierPackage was waiting to upload.
if (!is_download) {
estimate = std::min(transfered / double(transferable), 1.0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! but you need to check transferable is greater than 0. Running for upload notifications, with no data transfer ongoing test and checking the estimate reveals it as NaN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. okay.

Comment on lines +1610 to +1612
// The initial download size we get from the server is the uncompacted
// size, and so the download may complete before we actually receive
// that much data. When that happens, transferrable will drop and we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this actually also used for upload as a way to save the uploadable bytes at the time of registration (or when update is called first)? The change below also suggests this.

register_default_upload_callback();
progress.register_callback(
[&](auto, auto, double) {
callback_was_called = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, the estimate here is NaN

CHECK(callback_was_called);
CHECK(transferred == current_transferred);
CHECK(transferrable == original_transferrable);
CHECK(upload_estimate == std::min(1.0, current_transferred / double(original_transferrable)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use the minimum? since current_transferred > original_transferrable, the estimate is 1.0. If anything, i'd add a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've updated this to just CHECK(upload_estimate == 1.0)since the sync session notification won't let you have an estimate greater than 1.0 for non-streaming upload notifications

@jbreams jbreams merged commit c280bdb into master May 20, 2024
38 checks passed
@jbreams jbreams deleted the jbr/progress_regression branch May 20, 2024 20:01
@fealebenpae fealebenpae mentioned this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants