Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] --empty flag not working on Pseudo-columns #1243

Open
2 tasks done
christopherekfeldt opened this issue May 16, 2024 · 2 comments
Open
2 tasks done

[Bug] --empty flag not working on Pseudo-columns #1243

christopherekfeldt opened this issue May 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@christopherekfeldt
Copy link

christopherekfeldt commented May 16, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When trying out the empty flag on my models I get failures on all models that uses the pseudo-column "_PARTITIONTIME" in their logic. Here is my query, it has worked perfectly fine prior.

{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}

select
    customerId,
    customerId_token,
    preference,
    preferenceInd,
    createTS,
    updateTS,
    operator,
    ingstn_ts,
    ingestion_dt
from (
    select
        customerId,
        customerId_token,
        centralPreferences.preference,
        centralPreferences.preferenceInd,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", createTS) as createTS,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", updateTS) as updateTS,
        centralPreferences.operator,
        ingstn_ts,
        _PARTITIONTIME as ingestion_dt
    from
        {{ src_cpc_raw }},
        unnest(centralPreferences) as centralPreferences
)
{% if is_incremental() %}
    where date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1

But now it has swapped out the logic with a subquery that doesn't take the pseudo column into consideration:

/* {"app": "dbt", "dbt_version": "1.8.0", "profile_name": "etlapp", "target_name": "lab", "node_id": "model.batch_framework_module.harm_customer_preference_center__centralpreference"} */   

    create or replace table `ad25-p-datalab-fg2h`.`dbt_christopher`.`harm_customer_preference_center__centralpreference`
      
    
    cluster by ingstn_ts

    OPTIONS(
      description="""Incremental model for central preferences""",
    
      labels=[('batchfw_status', 'managed')]
    )
    as (
      select
    customerId,
    customerId_token,
    preference,
    preferenceInd,
    createTS,
    updateTS,
    operator,
    ingstn_ts,
    ingestion_dt
from (
    select
        customerId,
        customerId_token,
        centralPreferences.preference,
        centralPreferences.preferenceInd,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", createTS) as createTS,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", updateTS) as updateTS,
        centralPreferences.operator,
        ingstn_ts,
        _PARTITIONTIME as ingestion_dt
    from
        (select * from `ab73-np-rawlay-dev-3324`.`customer_preference_center`.`customer_preference` where false limit 0),
        unnest(centralPreferences) as centralPreferences
)

qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1
    );

Giving the error in BigQuery: "Unrecognized name: _PARTITIONTIME at [37:9]"

Expected Behavior

I expect the subquery to work with pseudo-columns as well.

Steps To Reproduce

  1. Use similiar SQL logic.
  2. Run dbt build -s model_name --empty

Relevant log output

No response

Environment

- OS: ubuntu:rolling
- Python: 3.9.11
- dbt-core: 1.8.0
- dbt-bigquery: 1.8.0

Additional Context

No response

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 16, 2024

@christopherekfeldt Thanks for the report!

The mechanism we're using for --empty is to wrap the source() and ref() calls in a subquery with select * ... where false limit 0. This * doesn't pass along pseudo-columns.

The first idea that came to mind:

  • we first access BQ metadata to figure out if the source/ref relation is an ingestion-time partitioned table
  • if it is, we include the pseudo-column — but even then, it must be aliased, so your subsequent query (referencing it as _PARTITIONTIME) will still fail
select *, _PARTITIONTIME as partition_time
from dbt_jcohen.myingestiontable
where false limit 0

Other ideas:

  • Append where false limit 0 without wrapping in a subquery (but this won't play nice with other where statements, unnest, etc)
  • Allow you to opt out this particular source() from the default --empty subquery, but access flags.EMPTY to apply your own conditional filter

@jtcohen6 jtcohen6 removed the triage label May 16, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 16, 2024

In the meantime, you can at least avoid the error by specifying .render() on any refs/sources that you don't want dbt to turn into where false limit 0 subqueries.

If we added support for flags.EMPTY, then you could write something like:

{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}

    select
        ...,
        _PARTITIONTIME as ingestion_dt
    from
        {{ src_cpc_raw.render() }},   -- this will be rendered simply into `project.dataset.identifier` (no subquery)
        unnest(centralPreferences) as centralPreferences
     where 1=1
{% if flags.EMPTY %}
    and false limit 0                 -- instead, I manually add the "empty limit" here
{% endif %}
{% if is_incremental() %}
    and date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants