-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set build user's uid when creating Migraphx/ROCM docker images #23657
base: main
Are you sure you want to change the base?
Conversation
/azp run ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
ac77f27
to
9333122
Compare
I bet: if I find a time when nobody uses this pipeline to submit this PR, then everything will pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
/azp run Linux GPU CI Pipeline, Linux MIGraphX CI Pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
The pipeline uses current user ID to run docker (and the user id might not exists in docker image):
How does BUILD_UID=1004 help? |
I bet the UID is always 1004. I added two commands at:
In the log it shows the uid is 1004. And the /home/onnxruntimedev folder is owned by the onnxruntimedev user, so, in theory it should work. But it is not working. When we built the image, we have:
|
Shall we also add group id using |
/azp run Linux GPU CI Pipeline, Linux MIGraphX CI Pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
Description
Set build user's uid when creating Migraphx/ROCM docker images
Motivation and Context
The two pipelines have a serious issue: the docker image used for building/testing the code could come from any branch! Because the machine has 8 GPUs. We don't want to waste the GPUs, so we run multiple ADO agents there to run build pipelines in parallel. However, there is only one docker daemon.
Therefore, last time when I made a change to the ROCM pipeline's docker image I made an error there but I still saw the pipeline passed. Then, later on after I checked in the change the pipeline started failing. It is a chaos.
This PR couldn't address the fundamental issue. It just fixes the mistake I introduced.