Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npu版本支持 #901

Closed
jhjiang10 opened this issue May 10, 2024 · 6 comments
Closed

npu版本支持 #901

jhjiang10 opened this issue May 10, 2024 · 6 comments

Comments

@jhjiang10
Copy link

目前npu都支持哪些训练和哪些模型?在使用npu微调72b_chat模型的时候总是报错

@yanshui177
Copy link

同问,跑通了qwen7b单机单卡和多卡的训练,但是分布式多机训练要用到hccl的配置了,目前没有相关文档描述用法

@jhjiang10
Copy link
Author

https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 有参考这个文档吗

有的,基本按照这个文档来改的,但是会莫名其妙崩溃。

@jhjiang10
Copy link
Author

https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 有参考这个文档吗
设置了ddp_backend为nccl,但是还是会跳到mpi,这块代码逻辑在哪可以看到

@jiaozhentian
Copy link

https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 有参考这个文档吗
设置了ddp_backend为nccl,但是还是会跳到mpi,这块代码逻辑在哪可以看到

你好,我修改了源码,跑起来了。首先在训练时添加参数--ddp_backend hccl,然后再修改源代码,只需修改代码中对--ddp_backend参数校验部分即可,在ccl后面添加hccl选项,参数校验通过后面就能正常运行了。

@jhjiang10
Copy link
Author

https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 有参考这个文档吗
设置了ddp_backend为nccl,但是还是会跳到mpi,这块代码逻辑在哪可以看到

你好,我修改了源码,跑起来了。首先在训练时添加参数--ddp_backend hccl,然后再修改源代码,只需修改代码中对--ddp_backend参数校验部分即可,在ccl后面添加hccl选项,参数校验通过后面就能正常运行了。
感谢感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants