Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/PROTO: Handle request status correctly and refactor #10473

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ofirfarjun7
Copy link
Contributor

What?

  • Handle request progress status properly.
  • Refactor Code.

Why?

In some flows of send operations we can reach ucs_fatal because status is not handled properly.

How?

Instead of returning unexpected status code we should abort operation and return UCS_OK

@@ -134,8 +134,8 @@ static UCS_F_ALWAYS_INLINE ucs_status_t ucp_am_eager_multi_bcopy_send_func(
packed_size = uct_ep_am_bcopy(uct_ep, UCP_AM_ID_AM_FIRST,
ucp_am_eager_multi_bcopy_pack_args_first,
&pack_ctx, 0);
status = ucp_proto_bcopy_send_func_status(packed_size);
status = ucp_proto_am_handle_user_header_send_status(req, status);
status = ucp_am_handle_user_header_send_status_common(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reproduce the original failure with unit test?
If so it would be helpful to add it

Copy link
Contributor Author

@ofirfarjun7 ofirfarjun7 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea but not for this PR because for now it's not practical.
To do this we need to add some hint to UCX mpool object that will cause allocation to fail, I don't think this is something this PR suppose to do but we can consider it for the future to test this case and other use cases that involve memory allocation.

Comment on lines 137 to 139
ucp_am_handle_user_header_send_status_common(ucp_request_t *req,
ucs_status_t status, int abort,
int release)
Copy link
Contributor

@evgeny-leksikov evgeny-leksikov Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10452 (comment) reference to initial conversation

@ofirfarjun7 ofirfarjun7 force-pushed the topic/fix-copy-header-status-handling branch from 2ef4c2c to 506195b Compare February 18, 2025 11:26
@@ -136,7 +136,7 @@ static UCS_F_ALWAYS_INLINE ucs_status_t ucp_am_eager_multi_bcopy_send_func(
ucp_am_eager_multi_bcopy_pack_args_first,
&pack_ctx, 0);
status = ucp_proto_bcopy_send_func_status(packed_size);
status = ucp_proto_am_handle_user_header_send_status(req, status);
status = ucp_am_handle_user_header_send_status_or_release(req, status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= align

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is after formatting with clang...
I can force alignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The = are clearly not aligned. Why clang suggests this?
Anyway, better force IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Code style - "Up to 80 columns"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's keep.

@gleon99
Copy link
Contributor

gleon99 commented Feb 19, 2025

@ofirfarjun7 please squash

@ofirfarjun7 ofirfarjun7 force-pushed the topic/fix-copy-header-status-handling branch from 506195b to e379499 Compare February 20, 2025 08:12
@ofirfarjun7 ofirfarjun7 force-pushed the topic/fix-copy-header-status-handling branch from e379499 to a16c4dd Compare February 20, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants