Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --device #6

Open
wants to merge 6 commits into
base: npu
Choose a base branch
from
Open

add --device #6

wants to merge 6 commits into from

Conversation

shink
Copy link

@shink shink commented Sep 20, 2024

  • dcgan
  • gat
  • gcn
  • language_translation
BACKEND_DEVICE=npu ./run_python_examples.sh "run_all"
Finished run_all, status 0
Some python examples failed:
saved models not found
mnist hogwild failed
graph convolutional network failed

Errors

1. gcn

root@3dfeb58e2e6e:~/pytorch-examples/gcn# python main.py --device npu
[W920 02:46:29.176903976 OperatorEntry.cpp:155] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::empty.memory_format(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: CPU
  previous kernel: registered at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:497
       new kernel: registered at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:100 (function operator())
Using npu device
Downloading dataset...
Loading dataset...
Traceback (most recent call last):
  File "/root/pytorch-examples/gcn/main.py", line 260, in <module>
    train_iter(epoch + 1, gcn, optimizer, criterion, (features, adj_mat), labels, idx_train, idx_val, args.val_every)
  File "/root/pytorch-examples/gcn/main.py", line 175, in train_iter
    output = model(*input)
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/pytorch-examples/gcn/main.py", line 104, in forward
    x = self.gc1(input_tensor, adj_mat)
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/pytorch-examples/gcn/main.py", line 58, in forward
    support = torch.mm(input_tensor, self.kernel) # Matrix multiplication between input and weight matrix
RuntimeError: CAUTION: The operator 'aten::addmm' is not currently supported on the NPU backend.
[ERROR] 2024-09-20-02:46:48 (PID:146560, Device:0, RankID:-1) ERR01007 OPS feature not supported

2. language_translation

上游 CI 已经不跑这个 example 了

no arm64:

image

3. mnist_hogwild

root@3dfeb58e2e6e:~/pytorch-examples/mnist_hogwild# python main.py --device npu
[W920 06:22:43.455693712 OperatorEntry.cpp:155] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::empty.memory_format(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: CPU
  previous kernel: registered at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:497
       new kernel: registered at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:100 (function operator())
Traceback (most recent call last):
  File "/root/pytorch-examples/mnist_hogwild/main.py", line 91, in <module>
    model.share_memory() # gradients are allocated lazily, so they are not shared here
  File "/usr/local/python3.9/lib/python3.9/site-packages/torch_npu/utils/npu_intercept.py", line 78, in wrapper
    raise RuntimeError(f"{str(func)} is not supported in npu." + pta_error(ErrCode.NOT_SUPPORT))
RuntimeError: <function Module.share_memory at 0xfffdff9013a0> is not supported in npu.
[ERROR] 2024-09-20-06:22:53 (PID:150151, Device:0, RankID:-1) ERR00007 PTA feature not supported

- dcgan
- gat
- gcn
- language_translation
@shink shink self-assigned this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant