Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The key has to be less than 250 characters #1440

Open
1 task
ms32035 opened this issue Jan 6, 2025 · 6 comments
Open
1 task

[Bug] The key has to be less than 250 characters #1440

ms32035 opened this issue Jan 6, 2025 · 6 comments
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc bug Something isn't working dbt:test Primarily related to dbt test command or functionality execution:kubernetes Related to Kubernetes execution environment triage-needed Items need to be reviewed / assigned to milestone

Comments

@ms32035
Copy link
Contributor

ms32035 commented Jan 6, 2025

Astronomer Cosmos Version

1.8.1

dbt-core version

1.9.0

Versions of dbt adapters

No response

LoadMode

DBT_LS_MANIFEST

ExecutionMode

KUBERNETES

InvocationMode

None

airflow version

2.10.4

Operating System

Debian GNU/Linux 12 (bookworm)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened?

Looks like for #1433 in case of some tests provided by packages like dbt_utils or dbt_expectations the names generated are absurdly long
and result in:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 968, in __init__
    validate_key(task_id)
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/helpers.py", line 55, in validate_key
    raise AirflowException(f"The key has to be less than {max_length} characters")
airflow.exceptions.AirflowException: The key has to be less than 250 characters

Additionally, it seems that tests are generated as their own task regardless of the test_behaviour setting. Specifically, these are generated when the value is AFTER_ALL

Relevant log output

How to reproduce

Create a test using dbt_utils.relationships_where where table and column names are long

Anything else :)?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Contact Details

No response

@ms32035 ms32035 added bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone labels Jan 6, 2025
Copy link

dosubot bot commented Jan 6, 2025

To address the issue of long task names generated by dbt_utils or dbt_expectations tests in Airflow, which exceed the 250 character limit, you can customize the task ID generation logic. Specifically, you can modify the _get_task_id_and_args function to create shorter task IDs. This can be achieved by adjusting the task_display_name construction or implementing a custom normalize_task_id function to generate concise task IDs [1].

Regarding the test_behaviour setting, when set to AFTER_ALL, tests are generated as their own task because this configuration ensures that tests run only after all models have been successfully executed. This behavior is intentional to allow tests to execute collectively at the end of the DAG execution, rather than after each model [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added area:rendering Related to rendering, like Jinja, Airflow tasks, etc dbt:test Primarily related to dbt test command or functionality execution:kubernetes Related to Kubernetes execution environment labels Jan 6, 2025
@jtrobles-cdd
Copy link

After upgrading Cosmos from 1.7.1 to 1.8.1, I also started experiencing problems caused by excessively long task names, though in my case it's another part of Airflow that's complaining about the 250 character limit:

[2025-01-06, 19:49:30 -03] {validators.py:135} ERROR - Invalid stat name: task.mem_usage.cordada.airflow_data_pipelines.dbt_data_pipelines.refresh_******_******_*****.run_dbt.relationships_stg_**_***_******_******_business_ops__**********_*****_****_********_*******_id_*****__id_*****__ref_stg_**_***_******_******_business_ops__**********_*****_index__test.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/airflow/metrics/validators.py", line 132, in wrapper
    stat = handler_stat_name_func(stat)
  File "/usr/local/lib/python3.10/site-packages/airflow/metrics/validators.py", line 216, in stat_name_default_handler
    raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat_name (task.mem_usage.cordada.airflow_data_pipelines.dbt_data_pipelines.refresh_******_******_*****.run_dbt.relationships_stg_**_***_******_******_business_ops__**********_*****_****_********_*******_id_*****__id_*****__ref_stg_**_***_******_******_business_ops__**********_*****_index__test) has to be less than 250 characters.

@ms32035 ms32035 changed the title [Bug] [Bug] The key has to be less than 250 characters Jan 7, 2025
@pankajkoti
Copy link
Contributor

pankajkoti commented Jan 7, 2025

Thank you for reporting the issue.

@ms32035 @jtrobles-cdd, I’m curious—did you only upgrade Cosmos, or was there also an upgrade to Airflow?

Additionally, since Airflow limits task keys to fewer than 250 characters, do you have any suggestions on how we could shorten the task keys while still making them identifiable in terms of the node they represent? Any ideas here would help us address the issue in a more user-friendly way.

@ms32035, regarding your comment:

It seems that tests are generated as their own task regardless of the test_behaviour setting, especially when the value is AFTER_ALL.

This is expected behavior, as outlined in the documentation: https://github.com/astronomer/astronomer-cosmos/blob/main/docs/configuration/testing-behavior.rst. You can refer to the section titled Example when changing the behavior to use TestBehavior.AFTER_ALL.

@ms32035
Copy link
Contributor Author

ms32035 commented Jan 7, 2025

@pankajkoti it's just Cosmos upgrade

One workaround idea I have, which is not exactly user friendly is to use the hash of the test that dbt generates

On the AFTER_ALL this isn't the expected behaviour as all tests run as a part of a single dbt test operator, and these have their own node and would probably run twice

@jtrobles-cdd
Copy link

did you only upgrade Cosmos, or was there also an upgrade to Airflow?

Only Cosmos. I've been using Airflow 2.10.4 (with Astro Runtime 12.6.0) for many weeks without issues.

@internetcoffeephone
Copy link
Contributor

Running into this too since cosmos==1.8.1.

@pankajkoti We could group these tests by joint parents. E.g. for model_a and model_b, we could create a new Airflow task named model_a-model_b.test, which runs all of the relevant tests. Still verbose, but less so than using the entire test configuration as a unique_id / task_id. I'm not sure about the Cosmos internals here re: failures on tests, so we might have to make model_a-model_b an Airflow TaskGroup and put the tests from individual parents inside of that TaskGroup separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rendering Related to rendering, like Jinja, Airflow tasks, etc bug Something isn't working dbt:test Primarily related to dbt test command or functionality execution:kubernetes Related to Kubernetes execution environment triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

4 participants