Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions Dagster produces in regular operations should be definition-focused ("managed") #27828

Open
dkarlovi opened this issue Feb 13, 2025 · 0 comments
Labels
type: bug Something isn't working

Comments

@dkarlovi
Copy link

What's the issue?

I've rebased our Dagster repo to our main branch which moved some things around and my local dev instance started crashing and producing this type of error

2025-02-13 10:32:28 +0000 - dagster-webserver - INFO - Received LocationStateChangeEventType.LOCATION_ERROR event for location redacted, refreshing
2025-02-13 10:32:28 +0000 - dagster - WARNING - /usr/local/lib/python3.12/site-packages/dagster/_core/workspace/context.py:825: UserWarning: Error loading repository location redacted:dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE

Stack Trace:
  File "/usr/local/lib/python3.12/site-packages/dagster/_core/workspace/context.py", line 820, in _load_location
    else origin.create_location(self.instance)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dagster/_core/remote_representation/origin.py", line 371, in create_location
    return GrpcServerCodeLocation(self, instance=instance)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dagster/_core/remote_representation/code_location.py", line 698, in __init__
    list_repositories_response = sync_list_repositories_grpc(self.client)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dagster/_api/list_repositories.py", line 20, in sync_list_repositories_grpc
    api_client.list_repositories(),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dagster/_grpc/client.py", line 328, in list_repositories
    res = self._query("ListRepositories", dagster_api_pb2.ListRepositoriesRequest)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dagster/_grpc/client.py", line 205, in _query
    self._raise_grpc_exception(
  File "/usr/local/lib/python3.12/site-packages/dagster/_grpc/client.py", line 188, in _raise_grpc_exception
    raise DagsterUserCodeUnreachableError(

The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmpd1f8a502: connect: No such file or directory (2)"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/tmpd1f8a502: connect: No such file or directory (2)", grpc_status:14, created_time:"2025-02-13T10:32:28.925684473+00:00"}"
>

Stack Trace:
  File "/usr/local/lib/python3.12/site-packages/dagster/_grpc/client.py", line 203, in _query
    return self._get_response(method, request=request_type(**kwargs), timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/dagster/_grpc/client.py", line 163, in _get_response
    return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/grpc/_channel.py", line 1181, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The above exception occurred during handling of the following exception:
dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE

over and over, with a bunch of details related to gRPC.

But, after examining the logs, I've found the culprint way before

/usr/local/lib/python3.12/site-packages/dagster/_core/workspace/context.py:825: UserWarning: Error loading repository location redacted:dagster._core.errors.DagsterInvalidDefinitionError: io manager with key 'bigquery_io' required by output 'result' of op 'shopify_collections'' was not provided. Please provide a IOManagerDefinition to key 'bigquery_io', or change the required key to one of the following keys which points to an IOManagerDefinition: ['io_manager', 'fs_io_manager']

The issue is the main branch removed the IO managers which my local branch was using and the fix was to redefine the used IO manager, which is simple. The issue here is Dagster's default logging was primarily telling me all about Dagster's internals, not about what my definition was doing wrong and it took awhile for me to notice this error at all.

Dagster could improve its error / exception handling to provide more domain-focused exceptions and put them front and center to the user: put greater emphasis on what part of the user's code is at fault and less about what the error did to Dagster's internals, since those are (most of the time) not in the user's focus.

Current level of detail could be kept for running Dagster in a debug mode where the user is trying to find and fix issues in Dagster's internals, but the regular UX should try to hide those from the user to enable them to find and fix issues in their own code. In my example, I don't use gRPC so 90% of the errors I'm seeing shouldn't be about it, they should focus on the part I messed up and need to fix.

What did you expect to happen?

Focus on user code and user errors, making the exceptions "managed" for the user (hiding Dagster internals by default)

How to reproduce?

Create a new asset using an undefined IO manager and watch the error logs produced by dagster dev -h 0.0.0.0

Dagster version

dagster, version 1.9.13

Deployment type

Other Docker-based deployment

Deployment details

Running dagster dev -h 0.0.0.0 in a Docker container based on mcr.microsoft.com/devcontainers/python:3.12

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@dkarlovi dkarlovi added the type: bug Something isn't working label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant