Skip to content

[nvbug 5333996 ][fix] Unload XQA cubins early to avoid static lifetime #5133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2025

Conversation

lowsfer
Copy link
Member

@lowsfer lowsfer commented Jun 11, 2025

Previously they were incorrectly managed by a static variable. So unloading could happen after cuda context destroy and can cause problems. This change use observer to manage them, and unload happens when the last user is destroyed.

Also switch to context-less loading APIs, so it's safe to switch cuda context for users.

Previously they were incorrectly managed by a static variable. So unloading could happen after cuda context destroy and can cause problems. This change use observer to manage them, and unload happens when the last user is destroyed.

Also switch to context-less loading APIs, so it's safe to switch cuda context for users.

Signed-off-by: Yao Yao <[email protected]>
@lowsfer lowsfer requested a review from ming-wei June 11, 2025 15:53
@lowsfer
Copy link
Member Author

lowsfer commented Jun 11, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8515 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8515 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6174 completed with status: 'FAILURE'

@lowsfer
Copy link
Member Author

lowsfer commented Jun 12, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8604 [ run ] triggered by Bot

@lowsfer lowsfer enabled auto-merge (squash) June 12, 2025 10:25
@lowsfer lowsfer disabled auto-merge June 12, 2025 10:27
@lowsfer
Copy link
Member Author

lowsfer commented Jun 12, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8667 [ run ] triggered by Bot

@lowsfer lowsfer enabled auto-merge (squash) June 13, 2025 06:33
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 13, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jun 13, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #8667 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6286 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@lowsfer lowsfer merged commit 12e075e into NVIDIA:main Jun 13, 2025
3 checks passed
@lowsfer lowsfer deleted the fix-cubin-unload branch June 13, 2025 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants