[source-tiktok-marketing] - Removes `stream_state` interpolation, custom cursor #53645

pnilan · 2025-02-11T23:42:43Z

What

No longer references stream_state as an interpolation variable. Now references stream_interval
Removes custom cursor for hourly report
Closes https://github.com/airbytehq/airbyte-internal-issues/issues/11602
Closes https://github.com/airbytehq/airbyte-internal-issues/issues/11663
Related: https://github.com/airbytehq/oncall/issues/7343

How

Please refer to GitHub issues above.

Review guide

manifest.yaml
unit_tests/integration/test_reports_hourly.py

User Impact

Enables concurrency for hourly and daily report streams.

Can this PR be safely reverted and rolled back?

YES 💚
NO ❌

…ng-report-records

vercel · 2025-02-11T23:42:49Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Feb 26, 2025 7:53pm

…ng-report-records

pnilan · 2025-02-21T01:33:55Z

airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml

    end_datetime:
      type: MinMaxDatetime
-      datetime: "{{ config.get('end_date', today_utc()) }}"
-      datetime_format: "%Y-%m-%d"
+      datetime: "{{ format_datetime((str_to_datetime(config.get('end_date')) if config.get('end_date') else now_utc()) + duration('PT23H'), '%Y-%m-%d %H:%M:%S') }}"


@maxi297 @brianjlai This duration (PT23H) added differs from the GH issue (P1D) but I think this solved the issue we ran into with mock server tests.

I noticed the partitions that were generated looked like this, assuming start date of 2024-01-01 and end date of 2024-01-02:

{'advertiser_id': '872746382648', 'parent_slice': {}, 'start_time': '2024-01-01 00:00:00', 'end_time': '2024-01-01 23:00:00'} {'advertiser_id': '872746382648', 'parent_slice': {}, 'start_time': '2024-01-02 00:00:00', 'end_time': '2024-01-03 00:00:00'}

When changing it to "PT23H", the partitions changed to the following, and the mock server tests' second request was mocked and matched correctly:

{'advertiser_id': '872746382648', 'parent_slice': {}, 'start_time': '2024-01-01 00:00:00', 'end_time': '2024-01-01 23:00:00'} {'advertiser_id': '872746382648', 'parent_slice': {}, 'start_time': '2024-01-02 00:00:00', 'end_time': '2024-01-02 23:00:00'}

I'll need to dive a bit deeper into the concurrent code but I think it's because the end date of the last partition was out of range relative to the actual end date and therefore the last request was not made. That's why the mock server test was previously trying to match the same request to two separate mocked requests.

The cursor granularity is not applied on the last slice. I'm not sure why exactly. I would have assumed that it was to keep the same behavior as the datetime based cursor but this is not what I see here

Note: This is outdated and irrelevant. Turns out I was experiencing this issue only in mock server tests due to failing to pass state when initializing the source, which is the only way to generate slices w/ a concurrent streams for a concurrent source. Once I correctly passed state, the stream slices were created as expected w/ "P1D" duration.

maxi297

The overall direction looks good but I would like to confirm the change on the tests, see the CATs pass and have regression test results as well

maxi297 · 2025-02-21T15:03:10Z

airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml

@@ -4314,7 +4319,7 @@ spec:
 # If time allows it, we can definitely try to scope this number a bit better empirically.
 concurrency_level:
  type: ConcurrencyLevel
-  default_concurrency: 3
+  default_concurrency: 1


default_concurrency == 1 is dangerous because threads are shared between partition generator and partition reader with a limit in the number of partition generator so that we can read those partitions and they don't just grow over time. With only one thread, there would be no partition reader and this might lead to deadlock where we can't enqueue partitions anymore but nobody reads to clean the queue. Can we put 2 there?

Thanks for the catch -- I set this just for testing purposes (when I hypothesizes that concurrency was causing rate limit issues). Will revert!

@maxi297 I've updated this to be

default_concurrency: "{{ config.get('concurrency_level', 3) }}"

This is because the mocks server tests are failing only when concurrency level > 1. I've set the mock server config to concurrency_level = 1.

maxi297 · 2025-02-21T15:03:30Z

...ntegrations/connectors/source-tiktok-marketing/unit_tests/integration/test_reports_hourly.py

@@ -1,6 +1,8 @@
 # Copyright (c) 2024 Airbyte, Inc., all rights reserved.

+import asyncio


nit: what are those imports? They don't seem used

Ah some failed experimentation on my part. Removed.

maxi297 · 2025-02-21T15:04:30Z

...ntegrations/connectors/source-tiktok-marketing/unit_tests/integration/test_reports_hourly.py

@@ -167,7 +169,7 @@ def test_read_with_state(self, http_mocker: HttpMocker):
            state=self.state(),
        )

-        assert len(output.records) == 1
+        assert len(output.records) == 2


Why do we have more records being returned? It would seem to indicate that the filtering is not working fine or that the setup/given part of this test is not valid anymore

Great catch. I didn't initialize the source w/ state in the mock server tests.

maxi297 · 2025-02-21T15:16:20Z

airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml

    end_datetime:
      type: MinMaxDatetime
-      datetime: "{{ config.get('end_date', today_utc()) }}"
-      datetime_format: "%Y-%m-%d"
+      datetime: "{{ format_datetime((str_to_datetime(config.get('end_date')) if config.get('end_date') else now_utc()) + duration('PT23H'), '%Y-%m-%d %H:%M:%S') }}"


The cursor granularity is not applied on the last slice. I'm not sure why exactly. I would have assumed that it was to keep the same behavior as the datetime based cursor but this is not what I see here

…ng-report-records

...ntegrations/connectors/source-tiktok-marketing/unit_tests/integration/test_reports_hourly.py

tolik0 · 2025-02-26T11:37:30Z

airbyte-integrations/connectors/source-tiktok-marketing/source_tiktok_marketing/manifest.yaml

-      condition: "{{ record['dimensions']['stat_time_hour'] >= stream_state.get('stat_time_hour', config.get('start_date', '')) }}"
-      $parameters:
-        partition_field: advertiser_id
+      type: RecordFilter


Can we use is_client_side_incremental instead?

This filter actually only applies to the very "first" slice. We request a day at a time and 24 reports are returned. (one for each hour). If our state value was 01-01-2025 @ 10AM then we would request 01-01-2025, but only return the reports from 10AM onward. For every requested day, we would return all 24 reports or as many reports as there have been hours in the day. Therefore I don't actually think client side incremental is necessary.

jaskaAd · 2025-02-28T12:40:36Z

Hey @pnilan , there issues are referenced here which aren't accessible
Closes https://github.com/airbytehq/airbyte-internal-issues/issues/11602 Closes https://github.com/airbytehq/airbyte-internal-issues/issues/11663 Related: https://github.com/airbytehq/oncall/issues/7343

We've been encountering problems with this connector where daily report and daily country report are constantly missing data, so we resorted to using hourly report which seemed to be reliable instead.

I am curious whether this PR resolves issue with daily reports missing data, thanks!

sheinbergon · 2025-03-02T20:34:48Z

@pnilan To whom it may concerns. I've created a variation of this connector that solves the daily report issues 100%, unaware you were already working on fixing it.

Should this not work, I'll gladly contribute my fix

jaskaAd · 2025-03-03T07:06:45Z

Hey @sheinbergon , it looks like your changes are for apple search source, not Tiktok master...sheinbergon:airbyte:sheinbergon/apple-search-ads-add-missing-paginators

Or am I misunderstanding something here?

sheinbergon · 2025-03-03T09:12:18Z

Hey @sheinbergon , it looks like your changes are for apple search source, not Tiktok master...sheinbergon:airbyte:sheinbergon/apple-search-ads-add-missing-paginators

Or am I misunderstanding something here?

@jaskaAd

This was 1 of 3 fixes meant to make the Apple Search Ads connector function as expected (and it now does, also being used in production). I'm talking about a fix I applied locally to the TikTok Marketing connector in order to get valid daily sync results. I also came across issues with the current version(s) of the connector, and was able to solve them by tampering with the incremental stream / end date settings, up to to the point my results are now on par with what the platform reports.

I'm using a locally built version of the connector in a local K8s Airbyte deployment

jaskaAd · 2025-03-03T09:58:01Z

Understood, could you share those local changes by any chance or submit a PR to this repo? 👀
@sheinbergon

sheinbergon · 2025-03-03T09:59:15Z

Understood, could you share those local changes by any chance or submit a PR to this repo? 👀 @sheinbergon

@jaskaAd Of course. Are you an Airbyte maintainer, BTW?

jaskaAd · 2025-03-03T11:55:16Z

Understood, could you share those local changes by any chance or submit a PR to this repo? 👀 @sheinbergon

@jaskaAd Of course. Are you an Airbyte maintainer, BTW?

Nope, just a mere spectator who hit into the issue of missing data from daily tiktok reports.

sheinbergon · 2025-03-03T13:04:22Z

Understood, could you share those local changes by any chance or submit a PR to this repo? 👀 @sheinbergon

@jaskaAd Of course. Are you an Airbyte maintainer, BTW?

Nope, just a mere spectator who hit into the issue of missing data from daily tiktok reports.

Here you go (changes in the manifest.yaml file):

You'll need to:

Replace base_report_retriever and report_daily_incremental_sync with the provided versions.
Build a docker image locally, using the airbyte ci tools.
Push to your private registry (we are using ECR).
Have your local/k8s airbyte access the customized source from using the full registry URL.
Rejoice

  base_report_retriever:
    type: SimpleRetriever
    requester:
      $ref: "#/definitions/requester"
      request_parameters:
        service_type: "AUCTION"
        report_type: "BASIC"
        data_level: '{{ parameters["data_level"] }}'
        dimensions: '{{ parameters["dimensions"] | string }}'
        metrics: '{{ (parameters.get("report_metrics", []) + ["spend", "cpc", "cpm", "impressions", "clicks", "ctr", "reach", "cost_per_1000_reached", "frequency", "video_play_actions", "video_watched_2s", "video_watched_6s", "average_video_play", "average_video_play_per_user", "video_views_p25", "video_views_p50", "video_views_p75", "video_views_p100", "profile_visits", "likes", "comments", "shares", "follows", "clicks_on_music_disc", "real_time_app_install", "real_time_app_install_cost", "app_install"]) | string }}'
        start_date: "{{ stream_interval['start_time'] }}"
        end_date: "{{ stream_interval['start_time'] }}"
        filtering: '{{ parameters["filtering"] | string if parameters.get("filtering") and config.get("include_deleted", False)}}'
      authenticator:
        $ref: "#/definitions/authenticator"
      request_body_json: {}
    record_selector:
      $ref: "#/definitions/record_selector_for_daily_reports_streams"
    paginator:
      $ref: "#/definitions/paginator_page_increment"
      pagination_strategy:
        type: "PageIncrement"
        page_size: 1000
        start_from_page: 1
    partition_router:
      $ref: "#/definitions/single_id_partition_router"

  report_daily_incremental_sync:
    type: DatetimeBasedCursor
    cursor_field: "stat_time_day"
    lookback_window: "P{{ config.get('attribution_window', 0) }}D"
    cursor_granularity: "P1D"
    step: P1D
    cursor_datetime_formats:
      - "%Y-%m-%d %H:%M:%S"
    datetime_format: "%Y-%m-%d"
    start_datetime:
      type: MinMaxDatetime
      datetime: "{{ config.get('start_date', '2016-09-01') }}"
      datetime_format: "%Y-%m-%d"
    end_datetime:
      type: MinMaxDatetime
      datetime: "{{ config.get('end_date', day_delta(-1,format='%Y-%m-%d')) }}"
      datetime_format: "%Y-%m-%d"

…tom cursor (#53645)

pnilan added 2 commits February 11, 2025 15:41

removes custom record filter

4346772

Merge branch 'master' into pnilan/source-tiktok-marketing/p1-oc-missi…

0c39679

…ng-report-records

octavia-squidington-iii added area/connectors Connector related issues connectors/source/tiktok-marketing labels Feb 11, 2025

vercel bot deployed to Preview February 11, 2025 23:55 View deployment

bump version

f1c5641

octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Feb 12, 2025

vercel bot deployed to Preview February 12, 2025 00:13 View deployment

pnilan added 2 commits February 11, 2025 16:19

update for rc

3842c6b

chore: format code

c7d56f2

vercel bot deployed to Preview February 12, 2025 00:32 View deployment

pnilan added 2 commits February 12, 2025 12:05

remove outdated test

edf96a9

remove stream_state interpolation

4e2a60a

pnilan changed the title ~~[source-tiktok-marketing] - P1~~ [source-tiktok-marketing] - Removes stream_state interpolation, custom cursor Feb 14, 2025

update to rc

aa222a6

vercel bot deployed to Preview February 14, 2025 00:52 View deployment

pnilan added 4 commits February 19, 2025 09:56

Merge branch 'master' into pnilan/source-tiktok-marketing/p1-oc-missi…

4d201a5

…ng-report-records

updates date formats

0f5018e

update cdk

bc6a3d5

update cursor format for daily reports

533415e

vercel bot deployed to Preview February 20, 2025 19:43 View deployment

pnilan added 2 commits February 20, 2025 15:46

update end date duration add

82d1e87

remove custom cursor test

5779ca0

vercel bot deployed to Preview February 20, 2025 23:53 View deployment

update version #

078c7c3

vercel bot deployed to Preview February 21, 2025 00:04 View deployment

fix version number, again

3a99105

vercel bot deployed to Preview February 21, 2025 00:19 View deployment

pnilan requested a review from maxi297 February 21, 2025 00:31

pnilan requested a review from brianjlai February 21, 2025 00:31

pnilan commented Feb 21, 2025

View reviewed changes

maxi297 reviewed Feb 21, 2025

View reviewed changes

pnilan added 2 commits February 24, 2025 09:58

Merge branch 'master' into pnilan/source-tiktok-marketing/p1-oc-missi…

021a87f

…ng-report-records

update airbyte-cdk to latest

6efba15

vercel bot deployed to Preview February 24, 2025 18:09 View deployment

pnilan mentioned this pull request Feb 24, 2025

feat(source-tiktok-marketing): Update CDK version #54655

Closed

2 tasks

update mock server tests

dcf96b3

vercel bot deployed to Preview February 24, 2025 21:57 View deployment

pnilan added 2 commits February 24, 2025 14:47

updated error message for 429 responses

4dc4be4

update requests-mock library to handle concurrency

48762e1

vercel bot deployed to Preview February 25, 2025 20:53 View deployment

pnilan requested a review from tolik0 February 25, 2025 21:54

tolik0 reviewed Feb 26, 2025

View reviewed changes

...ntegrations/connectors/source-tiktok-marketing/unit_tests/integration/test_reports_hourly.py Outdated Show resolved Hide resolved

tolik0 reviewed Feb 26, 2025

View reviewed changes

tolik0 approved these changes Feb 26, 2025

View reviewed changes

remove errant print

82b5aab

vercel bot deployed to Preview February 26, 2025 19:53 View deployment

pnilan enabled auto-merge (squash) February 26, 2025 20:28

pnilan merged commit ea3110d into master Feb 26, 2025
28 checks passed

pnilan deleted the pnilan/source-tiktok-marketing/p1-oc-missing-report-records branch February 26, 2025 20:42

btkcodedev pushed a commit that referenced this pull request Mar 4, 2025

[source-tiktok-marketing] - Removes stream_state interpolation, cus…

92e6b48

…tom cursor (#53645)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[source-tiktok-marketing] - Removes `stream_state` interpolation, custom cursor #53645

[source-tiktok-marketing] - Removes `stream_state` interpolation, custom cursor #53645

pnilan commented Feb 11, 2025 •

edited

Loading

vercel bot commented Feb 11, 2025 •

edited

Loading

pnilan Feb 21, 2025 •

edited

Loading

maxi297 Feb 21, 2025

pnilan Feb 24, 2025

maxi297 left a comment

maxi297 Feb 21, 2025

pnilan Feb 24, 2025

pnilan Feb 24, 2025

maxi297 Feb 21, 2025

pnilan Feb 24, 2025

maxi297 Feb 21, 2025

pnilan Feb 24, 2025

maxi297 Feb 21, 2025

tolik0 Feb 26, 2025

pnilan Feb 26, 2025

jaskaAd commented Feb 28, 2025

sheinbergon commented Mar 2, 2025

jaskaAd commented Mar 3, 2025

sheinbergon commented Mar 3, 2025 •

edited

Loading

jaskaAd commented Mar 3, 2025

sheinbergon commented Mar 3, 2025

jaskaAd commented Mar 3, 2025

sheinbergon commented Mar 3, 2025

		@@ -1,6 +1,8 @@
		# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

		import asyncio

[source-tiktok-marketing] - Removes stream_state interpolation, custom cursor #53645

[source-tiktok-marketing] - Removes stream_state interpolation, custom cursor #53645

Conversation

pnilan commented Feb 11, 2025 • edited Loading

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented Feb 11, 2025 • edited Loading

pnilan Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxi297 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaskaAd commented Feb 28, 2025

sheinbergon commented Mar 2, 2025

jaskaAd commented Mar 3, 2025

sheinbergon commented Mar 3, 2025 • edited Loading

jaskaAd commented Mar 3, 2025

sheinbergon commented Mar 3, 2025

jaskaAd commented Mar 3, 2025

sheinbergon commented Mar 3, 2025

[source-tiktok-marketing] - Removes `stream_state` interpolation, custom cursor #53645

[source-tiktok-marketing] - Removes `stream_state` interpolation, custom cursor #53645

pnilan commented Feb 11, 2025 •

edited

Loading

vercel bot commented Feb 11, 2025 •

edited

Loading

pnilan Feb 21, 2025 •

edited

Loading

sheinbergon commented Mar 3, 2025 •

edited

Loading