Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paraformer onnx-gpu耗时过长原因定位 #1793

Open
willnufe opened this issue Jun 7, 2024 · 1 comment
Open

paraformer onnx-gpu耗时过长原因定位 #1793

willnufe opened this issue Jun 7, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@willnufe
Copy link

willnufe commented Jun 7, 2024

我这边做了一些尝试,初步定位到paraformer onnx-gpu耗时过长的原因:

1. predictor中的cif部分

可将其替换成 https://github.com/George0828Zhang/torch_cif
(CIF的一种快速并行的实现方式,但没确认是否等价于paraformer内部的实现

2. onnxruntime 中 CUDA Settings的问题:

  1. cudnn_conv_algo_search 的默认配置是 EXHAUSTIVE,这个选项的默认配置是比较耗时的,尤其影响卷积操作(通过打印的日志可以发现耗时的部分全部集中在 decoder 部分的Conv_kernel_time)

    "dur" :52419,"ts" :4481356,"ph" : "X","name" :"/decoder/decoders.X/self_attn/fsmn_block/Conv_kernel_time"

image

  1. 因此需要将配置修改为下面的 providers 配置,同时我也确认过,tritonserver中有同样的配置,这样triton(paraformer onnx-gpu)的部署应该成为可能:
    providers = [
        (
            "CUDAExecutionProvider", 
            {"cudnn_conv_algo_search": "DEFAULT"}
        ),
        'CPUExecutionProvider'
    ]
    

image

@willnufe willnufe added the bug Something isn't working label Jun 7, 2024
@dtlzhuangz
Copy link
Contributor

我前两天刚实现了一版,我这边看着没什么问题,如果方便的话,也帮忙测一下 #1791

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants