Skip to content

Conversation

fregataa
Copy link
Member

@fregataa fregataa commented Sep 14, 2025

resolves #5867 (BA-2370)

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@fregataa fregataa added this to the 25.14 milestone Sep 14, 2025
@fregataa fregataa self-assigned this Sep 14, 2025
@Copilot Copilot AI review requested due to automatic review settings September 14, 2025 23:16
@github-actions github-actions bot added size:M 30~100 LoC comp:manager Related to Manager component labels Sep 14, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds error handling for storage proxy connection failures by introducing a new exception class and catching connection errors during volume retrieval operations.

  • Introduces StorageProxyConnectionError exception class for handling storage proxy connectivity issues
  • Updates the storage proxy manager client to catch and re-raise connection errors as the new exception type
  • Modifies the session manager to gracefully handle connection failures by logging warnings and returning empty results

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/ai/backend/manager/errors/storage.py Adds new StorageProxyConnectionError exception class
src/ai/backend/manager/clients/storage_proxy/session_manager.py Imports new exception and handles connection errors in volume fetching
src/ai/backend/manager/clients/storage_proxy/manager_facing_client.py Catches aiohttp.ClientConnectionError and re-raises as StorageProxyConnectionError

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)


class StorageProxyConnectionError(BackendAIError, web.HTTPClientError):
Copy link
Preview

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP status code should be 503 Service Unavailable (web.HTTPServiceUnavailable) instead of 400 Client Error (web.HTTPClientError) since connection failures indicate the service is temporarily unavailable, not a client request error.

Suggested change
class StorageProxyConnectionError(BackendAIError, web.HTTPClientError):
class StorageProxyConnectionError(BackendAIError, web.HTTPServiceUnavailable):

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it is not a ClientError because it is a Server Connection Error.

)


class StorageProxyConnectionError(BackendAIError, web.HTTPClientError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it is not a ClientError because it is a Server Connection Error.

Comment on lines +181 to +185
try:
reply = await client.get_volumes()
except StorageProxyConnectionError:
log.warning("Failed to connect to storage proxy (name: {})", proxy_name)
return []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this is the way to consume the error, is there a reason you worked this way?

@fregataa fregataa marked this pull request as draft September 15, 2025 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:manager Related to Manager component size:M 30~100 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle Storage Proxy connection error
2 participants