fix(extract): start resource progress timer at pipe loop start (#3518)#4039
Open
ernestprovo23 wants to merge 1 commit into
Open
fix(extract): start resource progress timer at pipe loop start (#3518)#4039ernestprovo23 wants to merge 1 commit into
ernestprovo23 wants to merge 1 commit into
Conversation
Collaborator
|
@ernestprovo23 there are 200+ files in your PR. did you resolve merge conflict? best if you start from fresh devel and cherry pick your work |
Author
|
ah my bad, that's a base-branch artifact not the actual change. i opened it against |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: per resource progress rate shows astronomical value for slow resources (#3518)
the bug
for a rest_api (or any) resource whose request is slow or non paginated, the per
resource line in LogCollector reports a rate of millions/s. the total extract time is
correct, only the per resource rate is wrong.
cause: LogCollector creates a counter lazily on the first update for a given key and
sets its start_time at that moment. the per resource counter is keyed on the table name
and is only updated when the first rows arrive (extractors.py _write_item / _import_item).
for a slow single response that first update happens after the whole wait, so start_time
is set late, elapsed time is ~0, and count/elapsed blows up. the aggregate "Resources"
counter is registered at pipe loop start so its timer is correct.
the fix
register each selected resource's progress counter with inc=0 at pipe loop start, right
where the "Resources" counter is registered. this sets start_time before the request goes
out, so elapsed time includes the wait and the rate is sane.
the table vs resource keying
the per resource line is keyed on the normalized table name, not the resource name. three
cases:
front and equals where rows land, so pre-registration uses the same key as the row
update. fixed.
not pre-registered. nothing changes for these.
so they resolve to their static default name when pre-registered, but rows actually go
elsewhere. to avoid a phantom "name: 0" line that never advances, the stale
pre-registered counter is dropped on the first write that targets a different table.
tests
later clock jump (and a control showing the buggy late start_time without registration).
start_time is set before a simulated wait, and that neither the with_table_name redirect
case nor the table_name callable case leaves a phantom counter line.
closes #3518