Skip to content

Conversation

lizable
Copy link
Contributor

@lizable lizable commented Aug 25, 2025

resolves #5633 (BA-MMM)

The fix addresses an issue in src/ai/backend/manager/sokovan/deployment/deployment_controller.py:517 where model service creation was failing due to improper image handling:

  • Added image resolution logic: Uses ImageRow.resolve() to convert image string to proper ImageRef object
  • Fixed type mismatch: kernel_config.image_ref now receives the correctly typed ImageRef instead of a raw string
  • Maintained consistency: Follows the same image resolution pattern used in regular session creation
  • Preserved JSON serialization: Kept endpoint_resource_opts.image_ref as string for proper serialization

The change ensures model services can be created successfully by providing the kernel configuration with the expected ImageRef object type while maintaining backward compatibility for stored endpoint resource options.

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@lizable lizable requested review from Copilot and HyeockJinKim and removed request for Copilot August 25, 2025 09:19
@lizable lizable requested a review from achimnol August 25, 2025 09:20
@github-actions github-actions bot added size:S 10~30 LoC comp:manager Related to Manager component labels Aug 25, 2025
@Copilot Copilot AI review requested due to automatic review settings August 25, 2025 15:23
@lizable lizable marked this pull request as draft August 25, 2025 15:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a type mismatch issue in model service creation where a string image identifier was being passed instead of the expected ImageRef object. The fix implements proper image resolution to convert the string to an ImageRef object while maintaining backward compatibility for JSON serialization.

Key changes:

  • Added image resolution logic using ImageRow.resolve() to convert image string to ImageRef object
  • Updated kernel_config.image_ref to use the resolved ImageRef instead of raw string
  • Maintained string format for endpoint_resource_opts.image_ref to preserve JSON serialization compatibility

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +522 to +541
image_row = await ImageRow.resolve(
session,
[
ImageIdentifier(model_spec.image, model_spec.architecture),
ImageAlias(model_spec.image),
],
)
Copy link
Preview

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ImageRow.resolve() call lacks error handling. If image resolution fails (e.g., image not found), this could raise an unhandled exception. Consider wrapping this in a try-catch block and providing a meaningful error message for debugging model service creation failures.

Suggested change
image_row = await ImageRow.resolve(
session,
[
ImageIdentifier(model_spec.image, model_spec.architecture),
ImageAlias(model_spec.image),
],
)
try:
image_row = await ImageRow.resolve(
session,
[
ImageIdentifier(model_spec.image, model_spec.architecture),
ImageAlias(model_spec.image),
],
)
except Exception as exc:
log.error(
"Failed to resolve image for model service creation: image='{}', architecture='{}'. Exception: {}",
model_spec.image,
model_spec.architecture,
exc,
)
raise ServiceInfoRetrievalFailed(
f"Failed to resolve image '{model_spec.image}' (architecture: '{model_spec.architecture}') for model service creation."
) from exc

Copilot uses AI. Check for mistakes.

@lizable lizable force-pushed the fix/cannot-create-model-service branch from 8ec0d91 to 8b02f20 Compare August 28, 2025 13:58
@github-actions github-actions bot added size:M 30~100 LoC and removed size:S 10~30 LoC labels Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:manager Related to Manager component size:M 30~100 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant