Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set build user's uid when creating Migraphx/ROCM docker images #23657

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

snnn
Copy link
Member

@snnn snnn commented Feb 12, 2025

Description

Set build user's uid when creating Migraphx/ROCM docker images

Motivation and Context

The two pipelines have a serious issue: the docker image used for building/testing the code could come from any branch! Because the machine has 8 GPUs. We don't want to waste the GPUs, so we run multiple ADO agents there to run build pipelines in parallel. However, there is only one docker daemon.
Therefore, last time when I made a change to the ROCM pipeline's docker image I made an error there but I still saw the pipeline passed. Then, later on after I checked in the change the pipeline started failing. It is a chaos.
This PR couldn't address the fundamental issue. It just fixes the mistake I introduced.

@snnn snnn marked this pull request as draft February 12, 2025 01:10
@snnn
Copy link
Member Author

snnn commented Feb 12, 2025

/azp run ONNX Runtime Web CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@snnn snnn force-pushed the snnn/ci branch 2 times, most recently from ac77f27 to 9333122 Compare February 12, 2025 04:02
@snnn snnn changed the title Set build user's uid when creating Migraphx/ROCM docker images Revert the change that added vcpkg to ROCM CI pipeline Feb 14, 2025
@snnn
Copy link
Member Author

snnn commented Feb 14, 2025

I bet: if I find a time when nobody uses this pipeline to submit this PR, then everything will pass.

@snnn snnn changed the title Revert the change that added vcpkg to ROCM CI pipeline Set build user's uid when creating Migraphx/ROCM docker images Feb 18, 2025
@snnn snnn requested a review from tianleiwu February 18, 2025 04:20
@snnn snnn marked this pull request as ready for review February 18, 2025 04:21
jingyanwangms
jingyanwangms previously approved these changes Feb 18, 2025
Copy link
Contributor

@jingyanwangms jingyanwangms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@snnn
Copy link
Member Author

snnn commented Feb 18, 2025

/azp run Linux GPU CI Pipeline, Linux MIGraphX CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@tianleiwu
Copy link
Contributor

The pipeline uses current user ID to run docker (and the user id might not exists in docker image):

How does BUILD_UID=1004 help?

@snnn
Copy link
Member Author

snnn commented Feb 18, 2025

How does BUILD_UID=1004 help?

I bet the UID is always 1004. I added two commands at:

python --version; id ; ls -lha /home ; \

In the log it shows the uid is 1004. And the /home/onnxruntimedev folder is owned by the onnxruntimedev user, so, in theory it should work. But it is not working.

Here is a log:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1620518&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=7d92004f-43ca-5c62-0a5c-970cbb2d24e8

When we built the image, we have:

adduser --gecos 'onnxruntime Build User' --disabled-password onnxruntimedev --uid 1004

@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 18, 2025

In the log it shows the uid is 1004. And the /home/onnxruntimedev folder is owned by the onnxruntimedev user, so, in theory it should work. But it is not working.

Shall we also add group id using addgroup?

tianleiwu
tianleiwu previously approved these changes Feb 18, 2025
@amarin16
Copy link
Collaborator

/azp run Linux GPU CI Pipeline, Linux MIGraphX CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@snnn snnn dismissed stale reviews from tianleiwu and jingyanwangms via fe148ce February 20, 2025 01:09
amarin16
amarin16 previously approved these changes Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants