Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ascend for Volcano get "Fail to load custom plugins: plugin: not implemented" #3959

Closed
glmapper opened this issue Jan 8, 2025 · 6 comments · May be fixed by #4042
Closed

Ascend for Volcano get "Fail to load custom plugins: plugin: not implemented" #3959

glmapper opened this issue Jan 8, 2025 · 6 comments · May be fixed by #4042
Labels
kind/question Categorizes issue related to a new question

Comments

@glmapper
Copy link

glmapper commented Jan 8, 2025

Please describe your problem in detail

Use Volcano to schedule Huawei 910B series NPU cards and extend it by integrating Ascend plugins based on the open-source Volcano. Reference documentation:
https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/clusterschedulingig/clusterschedulingig/dlug_installation_023.html

Base Informations:

  • k8s version 1.22.17
  • volcano version 1.9.0
  • ubuntu 20.04 jammy
  • arrch64

then execute the following command:

kubectl apply -f installer/volcano-development.yaml 

I have got volcano-scheduler's POD with CrashLoopBackOff status, the pod's logs :

W0107 20:46:31.387250       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0107 20:46:31.387420       1 server.go:78] Fail to load custom plugins: plugin: not implemented

pod events:

Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  45s               default-scheduler  Successfully assigned volcano-system/volcano-scheduler-7b9fb44659-m2scs to master-node-210-75
  Normal   Pulled     2s (x4 over 41s)  kubelet            Container image "volcanosh/vc-scheduler:v1.9.0" already present on machine
  Normal   Created    2s (x4 over 41s)  kubelet            Created container volcano-scheduler
  Normal   Started    2s (x4 over 41s)  kubelet            Started container volcano-scheduler
  Warning  BackOff    2s (x5 over 39s)  kubelet            Back-off restarting failed container

Since this document involves building an NPU scheduling plugin, and there is no relevant error message after following the operations in the official document, it is tentatively assumed that the building of the plugin is normal. During the troubleshooting process, I have read the relevant code logic in VOLCANO 1.9.0 regarding the loading of plugins when the scheduler starts. However, judging from the code (interpreting the code according to the structured programming mindset), it seems that "plugin: not implemented" is an error that is bound to occur (I'm not very familiar with the dynamic plugin mechanism of GO).
If the community has some experiences and ideas on solving this problem, I hope they can be discussed and shared in this issue.

@glmapper glmapper added the kind/question Categorizes issue related to a new question label Jan 8, 2025
@JesseStutler
Copy link
Member

Hi, thank you for your issue, please translate into English first, which can allow users around the world to understand, thanks :)

@hwdef
Copy link
Member

hwdef commented Jan 8, 2025

Please ask in https://gitee.com/ascend/ascend-for-volcano

@JesseStutler
Copy link
Member

Hi, did you delete the arg -buildmode=plugin in ascend-volcano build script? It's necessary to build a go .so plugin. I could successfully build the NPU.so and run the vc-scheduler

@glmapper
Copy link
Author

glmapper commented Jan 8, 2025

Hi, did you delete the arg -buildmode=plugin in ascend-volcano build script? It's necessary to build a go .so plugin. I could successfully build the NPU.so and run the vc-scheduler

https://gitee.com/ascend/mind-cluster/blob/master/component/ascend-for-volcano/build/build.sh

image

no changes

@JesseStutler
Copy link
Member

JesseStutler commented Jan 8, 2025

I have met the same problem :(
I think it's not related to volcano scheduler's problem, it's because the build script, may need to post issue in their community: https://gitee.com/ascend/mind-cluster/tree/master/component/ascend-for-volcano

@glmapper
Copy link
Author

glmapper commented Jan 9, 2025

I have met the same problem :( I think it's not related to volcano scheduler's problem, it's because the build script, may need to post issue in their community: https://gitee.com/ascend/mind-cluster/tree/master/component/ascend-for-volcano

I have resolved this problem by using ascend official vocano images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Categorizes issue related to a new question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants