Skip to content

fix(extract): start resource progress timer at pipe loop start (#3518)#4039

Open
ernestprovo23 wants to merge 1 commit into
dlt-hub:develfrom
ernestprovo23:fix/3518-resource-progress-timing
Open

fix(extract): start resource progress timer at pipe loop start (#3518)#4039
ernestprovo23 wants to merge 1 commit into
dlt-hub:develfrom
ernestprovo23:fix/3518-resource-progress-timing

Conversation

@ernestprovo23

Copy link
Copy Markdown

fix: per resource progress rate shows astronomical value for slow resources (#3518)

the bug

for a rest_api (or any) resource whose request is slow or non paginated, the per
resource line in LogCollector reports a rate of millions/s. the total extract time is
correct, only the per resource rate is wrong.

cause: LogCollector creates a counter lazily on the first update for a given key and
sets its start_time at that moment. the per resource counter is keyed on the table name
and is only updated when the first rows arrive (extractors.py _write_item / _import_item).
for a slow single response that first update happens after the whole wait, so start_time
is set late, elapsed time is ~0, and count/elapsed blows up. the aggregate "Resources"
counter is registered at pipe loop start so its timer is correct.

the fix

register each selected resource's progress counter with inc=0 at pipe loop start, right
where the "Resources" counter is registered. this sets start_time before the request goes
out, so elapsed time includes the wait and the rate is sane.

the table vs resource keying

the per resource line is keyed on the normalized table name, not the resource name. three
cases:

  • static single table resource (the reported case): the static table name is known up
    front and equals where rows land, so pre-registration uses the same key as the row
    update. fixed.
  • dynamic table name (table_name callable): table is unknown until items arrive, so it is
    not pre-registered. nothing changes for these.
  • per item redirect via dlt.mark.with_table_name: these have no dynamic table name flag,
    so they resolve to their static default name when pre-registered, but rows actually go
    elsewhere. to avoid a phantom "name: 0" line that never advances, the stale
    pre-registered counter is dropped on the first write that targets a different table.

tests

  • collector unit tests assert start_time is fixed at registration and elapsed reflects a
    later clock jump (and a control showing the buggy late start_time without registration).
  • extract integration tests with an injected clock assert the static resource counter
    start_time is set before a simulated wait, and that neither the with_table_name redirect
    case nor the table_name callable case leaves a phantom counter line.

closes #3518

@rudolfix

Copy link
Copy Markdown
Collaborator

@ernestprovo23 there are 200+ files in your PR. did you resolve merge conflict? best if you start from fresh devel and cherry pick your work

@ernestprovo23 ernestprovo23 changed the base branch from master to devel June 11, 2026 12:26
@ernestprovo23

Copy link
Copy Markdown
Author

ah my bad, that's a base-branch artifact not the actual change. i opened it against master instead of devel, so it was diffing ~15 merges of drift. just retargeted base to devel and it collapses to the real diff: 1 commit, 6 files, the #3518 fix. no fresh branch or cherry-pick needed, it's already clean against devel (shows mergeable now). lmk if CI needs a workflow approval on your end to kick it off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RestAPI incorrectly reports resource runtime for non-paginated requests

2 participants