Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIKE: Validate acceptable performance for switching show terse objects -> show objects to enable dynamic table detection #1038

Closed
jtcohen6 opened this issue May 14, 2024 · 3 comments · May be fixed by #1046
Assignees
Labels
bug Something isn't working dynamic_tables High Severity bug with significant impact that should be resolved in a reasonable timeframe

Comments

@jtcohen6
Copy link
Contributor

Summarizing from a conversation with @colin-rogers-dbt @amychen1776 @dataders and the Snowflake streaming team (thanks folks!)

Problem

In the Snowflake 2024_03 behavior change bundle:

  • show [terse] objects stopped reporting DYNAMIC_TABLE as the kind of dynamic tables
  • show objects added a new column, is_dynamic, which says whether a TABLE is a dynamic table. There are likely to be similar columns in the future for is_hybrid, is_iceberg, etc. The Snowflake team is standardizing on all of these being attributes of TABLE, rather than different objects.

Within dbt, we rely on:

  • Knowing confidently whether a relational object is a Table or a DynamicTable at the beginning of the 'table' and 'dynamic_table' materialization flows.
  • Running a cache population query at the start of the invocation to find out which objects exist in relevant schemas, and what their various types are. Currently, we use show terse objects, because it's the fastest way to get at all the information we need. This changes with the 2024_03 BCB.

Proposed resolution

We believe there are two viable options:

  1. Switch show terse objectsshow objects
    • Spike: jerco/spike-show-objects-dts
    • Pro: This restores the previous behavior in the most straightforward manner.
    • Pro(?): This might be the route we'd need to go down in order to better support hybrid / iceberg / etc
    • Con: Definitionally slower to run than show terse objects (~300ms → ~500ms in a schema with 1k objects).
    • Con: Returning a larger data structure to dbt means it requires more memory to process.
    • (We could look into running terse conditionally, based on whether there are models materialized as dynamic_table defined in the project, but this feels like another invitation for edge cases.)
  2. Start treating dynamic tables as RelationType.Table, with the understanding that we need to run an additional describe query at the start of the dynamic_table materialization to figure out if it's actually a dynamic or a static table (among other configs).
    • Similar to the workaround described here: [Bug] Dynamic tables with Snowflake change bundle 2024_03 results in dynamic table to issue  #1016 (comment)
    • Pro: Runs an additional query, but only while materializing dynamic tables. For DTs that already exist, we already need to run this query to check and see if any configs have changed.
    • Pro: Longer-term, this additional query would no longer be necessary if Snowflake added support for create or replace when switching between tables and dynamics tables. (The team is looking into this, it may be feasible but it may not be.)
    • Con: Shorter-term, this would require one additional describe statement per DT, until we can re-plumb the materialization logic (or start caching more relation attributes) to avoid rerunning.
    • Con: Until the Snowflake team can support create or replace table for DT → table, this will have a known edge case where switching materialized: 'dynamic_table' to materialized: table does not work.

Given that (2) requires a rework to our data model for dynamic tables — no longer a SnowflakeRelationType.DynamicTable, really just a SnowflakeRelationType.Table with an additional attribute — it feels like (1) is our more straightforward resolution in the near term. But we need to first validate that the performance degradation is acceptable (<1s of additional time in large projects/schemas), because we know it will be slightly slower. If it's dramatically slower, then we need to pursue a cleverer (trickier) approach.

Acceptance criteria

  • Validate that show objects (instead of show terse objects) has acceptable performance for multiple databases/schemas containing thousands of objects
  • If so, move forward with that approach
  • If not, pursue approach (2) above
@jtcohen6 jtcohen6 added bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe labels May 14, 2024
@dataders dataders changed the title Validate acceptable performance for switching show terse objects -> show objects to enable dynamic table detection SPIKE: Validate acceptable performance for switching show terse objects -> show objects to enable dynamic table detection May 14, 2024
@dataders
Copy link
Contributor

I agree that option 1 is our best short-term choice

But we need to first validate that the performance degradation is acceptable (<1s of additional time in large projects/schemas), because we know it will be slightly slower.

I just tried the spike-show-objects-dts branch and it works a treat. Though perhaps it is more useful to benchmark dbt's clock time for a model materialization (or an entire dbt run) rather than just the SnowflakeAdapter.list_relations_without_caching() method.

I had some thoughts about option 2, but opted to open #1040 and continue discussion there.

@colin-rogers-dbt
Copy link
Contributor

stress test scenarios:

  1. Not using DTs
  2. Using some DTs

@jtcohen6
Copy link
Contributor Author

Spike: #1046
Resolution: #1049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dynamic_tables High Severity bug with significant impact that should be resolved in a reasonable timeframe
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants