[Bug] The key has to be less than 250 characters #1440

ms32035 · 2025-01-06T16:11:23Z

Astronomer Cosmos Version

1.8.1

dbt-core version

1.9.0

Versions of dbt adapters

No response

LoadMode

DBT_LS_MANIFEST

ExecutionMode

KUBERNETES

InvocationMode

None

airflow version

2.10.4

Operating System

Debian GNU/Linux 12 (bookworm)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened?

Looks like for #1433 in case of some tests provided by packages like dbt_utils or dbt_expectations the names generated are absurdly long
and result in:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 968, in __init__
    validate_key(task_id)
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/helpers.py", line 55, in validate_key
    raise AirflowException(f"The key has to be less than {max_length} characters")
airflow.exceptions.AirflowException: The key has to be less than 250 characters

Additionally, it seems that tests are generated as their own task regardless of the test_behaviour setting. Specifically, these are generated when the value is AFTER_ALL

Relevant log output

How to reproduce

Create a test using dbt_utils.relationships_where where table and column names are long

Anything else :)?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Contact Details

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2025-01-06T16:12:30Z

To address the issue of long task names generated by dbt_utils or dbt_expectations tests in Airflow, which exceed the 250 character limit, you can customize the task ID generation logic. Specifically, you can modify the _get_task_id_and_args function to create shorter task IDs. This can be achieved by adjusting the task_display_name construction or implementing a custom normalize_task_id function to generate concise task IDs ^[1].

Regarding the test_behaviour setting, when set to AFTER_ALL, tests are generated as their own task because this configuration ensures that tests run only after all models have been successfully executed. This behavior is intentional to allow tests to execute collectively at the end of the DAG execution, rather than after each model ^[2].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

jtrobles-cdd · 2025-01-06T23:04:14Z

After upgrading Cosmos from 1.7.1 to 1.8.1, I also started experiencing problems caused by excessively long task names, though in my case it's another part of Airflow that's complaining about the 250 character limit:

[2025-01-06, 19:49:30 -03] {validators.py:135} ERROR - Invalid stat name: task.mem_usage.cordada.airflow_data_pipelines.dbt_data_pipelines.refresh_******_******_*****.run_dbt.relationships_stg_**_***_******_******_business_ops__**********_*****_****_********_*******_id_*****__id_*****__ref_stg_**_***_******_******_business_ops__**********_*****_index__test.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/airflow/metrics/validators.py", line 132, in wrapper
    stat = handler_stat_name_func(stat)
  File "/usr/local/lib/python3.10/site-packages/airflow/metrics/validators.py", line 216, in stat_name_default_handler
    raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat_name (task.mem_usage.cordada.airflow_data_pipelines.dbt_data_pipelines.refresh_******_******_*****.run_dbt.relationships_stg_**_***_******_******_business_ops__**********_*****_****_********_*******_id_*****__id_*****__ref_stg_**_***_******_******_business_ops__**********_*****_index__test) has to be less than 250 characters.

pankajkoti · 2025-01-07T11:22:47Z

Thank you for reporting the issue.

@ms32035 @jtrobles-cdd, I’m curious—did you only upgrade Cosmos, or was there also an upgrade to Airflow?

Additionally, since Airflow limits task keys to fewer than 250 characters, do you have any suggestions on how we could shorten the task keys while still making them identifiable in terms of the node they represent? Any ideas here would help us address the issue in a more user-friendly way.

@ms32035, regarding your comment:

It seems that tests are generated as their own task regardless of the test_behaviour setting, especially when the value is AFTER_ALL.

This is expected behavior, as outlined in the documentation: https://github.com/astronomer/astronomer-cosmos/blob/main/docs/configuration/testing-behavior.rst. You can refer to the section titled Example when changing the behavior to use TestBehavior.AFTER_ALL.

ms32035 · 2025-01-07T12:20:30Z

@pankajkoti it's just Cosmos upgrade

One workaround idea I have, which is not exactly user friendly is to use the hash of the test that dbt generates

On the AFTER_ALL this isn't the expected behaviour as all tests run as a part of a single dbt test operator, and these have their own node and would probably run twice

jtrobles-cdd · 2025-01-07T12:35:25Z

did you only upgrade Cosmos, or was there also an upgrade to Airflow?

Only Cosmos. I've been using Airflow 2.10.4 (with Astro Runtime 12.6.0) for many weeks without issues.

internetcoffeephone · 2025-01-07T13:49:51Z

Running into this too since cosmos==1.8.1.

@pankajkoti We could group these tests by joint parents. E.g. for model_a and model_b, we could create a new Airflow task named model_a-model_b.test, which runs all of the relevant tests. Still verbose, but less so than using the entire test configuration as a unique_id / task_id. I'm not sure about the Cosmos internals here re: failures on tests, so we might have to make model_a-model_b an Airflow TaskGroup and put the tests from individual parents inside of that TaskGroup separately.

ms32035 added bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone labels Jan 6, 2025

dosubot bot added area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:test Primarily related to dbt test command or functionality execution:kubernetes Related to Kubernetes execution environment labels Jan 6, 2025

ms32035 changed the title ~~[Bug]~~ [Bug] The key has to be less than 250 characters Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The key has to be less than 250 characters #1440

[Bug] The key has to be less than 250 characters #1440

ms32035 commented Jan 6, 2025

dosubot bot commented Jan 6, 2025

jtrobles-cdd commented Jan 6, 2025

pankajkoti commented Jan 7, 2025 •

edited

Loading

ms32035 commented Jan 7, 2025

jtrobles-cdd commented Jan 7, 2025

internetcoffeephone commented Jan 7, 2025

[Bug] The key has to be less than 250 characters #1440

[Bug] The key has to be less than 250 characters #1440

Comments

ms32035 commented Jan 6, 2025

Astronomer Cosmos Version

dbt-core version

Versions of dbt adapters

LoadMode

ExecutionMode

InvocationMode

airflow version

Operating System

If a you think it's an UI issue, what browsers are you seeing the problem on?

Deployment

Deployment details

What happened?

Relevant log output

How to reproduce

Anything else :)?

Are you willing to submit PR?

Contact Details

dosubot bot commented Jan 6, 2025

jtrobles-cdd commented Jan 6, 2025

pankajkoti commented Jan 7, 2025 • edited Loading

ms32035 commented Jan 7, 2025

jtrobles-cdd commented Jan 7, 2025

internetcoffeephone commented Jan 7, 2025

pankajkoti commented Jan 7, 2025 •

edited

Loading