dlt version
1.18.2
Describe the problem
When using the RestAPI source to pull API resources that are singular requests that take a long time for TTFB, time to first byte, the timing report does not accurately reflect the response. In the following logs the request takes ~6 minutes to get the response and returns a large ~100mb json array with 34k records in it.
------------------------------- Extract rest_api -------------------------------
Resources: 0/1 (0.0%) | Time: 0.00s | Rate: 0.00/s
------------------------------- Extract rest_api -------------------------------
Resources: 0/1 (0.0%) | Time: 423.05s | Rate: 0.00/s
my_slow_api_resource: 34727 | Time: 0.00s | Rate: 2141994044.24/s
------------------------------- Extract rest_api -------------------------------
Resources: 1/1 (100.0%) | Time: 423.06s | Rate: 0.00/s
my_slow_api_resource: 34727 | Time: 0.01s | Rate: 2406218.01/s
This log is from a configured info level, and every 20s. Given this is an isolated run it's easy to see, but generally this resource is in a batch of 12 and the fact it takes so long is not obvious in the group as all of this vendors endpoints are not paginated. So each resource says it takes <1s.
Expected behavior
The resource should record the start of it's timing before it makes a single request, not from the end of the first request.
Steps to reproduce
Create a flask endpoint that sleeps 30s before returning a response but maintains the connection. Observe DLT log timing recorded <1s for the resource.
Operating system
macOS
Runtime environment
Virtual Machine
Python version
3.11
dlt data source
RestAPI Source
dlt destination
DuckDB
Other deployment details
No response
Additional information
No response
dlt version
1.18.2
Describe the problem
When using the RestAPI source to pull API resources that are singular requests that take a long time for TTFB, time to first byte, the timing report does not accurately reflect the response. In the following logs the request takes ~6 minutes to get the response and returns a large ~100mb json array with 34k records in it.
This log is from a configured info level, and every 20s. Given this is an isolated run it's easy to see, but generally this resource is in a batch of 12 and the fact it takes so long is not obvious in the group as all of this vendors endpoints are not paginated. So each resource says it takes <1s.
Expected behavior
The resource should record the start of it's timing before it makes a single request, not from the end of the first request.
Steps to reproduce
Create a flask endpoint that sleeps 30s before returning a response but maintains the connection. Observe DLT log timing recorded <1s for the resource.
Operating system
macOS
Runtime environment
Virtual Machine
Python version
3.11
dlt data source
RestAPI Source
dlt destination
DuckDB
Other deployment details
No response
Additional information
No response