Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nbest C++ api use too much vram #1281

Open
binhtranmcs opened this issue Apr 22, 2024 · 5 comments
Open

nbest C++ api use too much vram #1281

binhtranmcs opened this issue Apr 22, 2024 · 5 comments

Comments

@binhtranmcs
Copy link

binhtranmcs commented Apr 22, 2024

I tried hlg decoding using both hlg_decode.cu and hlg_decode.py, modified them a bit to get nbest from the lattice. The python api seems fine but i got OOM when using c++ api. As i know, python api is still c++ under the hood. So I wonder if you all face the same issue as mine, that the c++ api use a lot more memory than python api, or I installed k2 incorrectly in some ways. Please help me with this.

Model: librispeech conformer ctc
Code:
python: hlg_decode.py.txt
c++: hlg_decode.cu.txt
Audio (change ext to .wav): testvram.txt

Thanks in advance!

@binhtranmcs
Copy link
Author

Any updates on this?

@binhtranmcs
Copy link
Author

@pkufool @csukuangfj do have any insight into this

@danpovey
Copy link
Collaborator

danpovey commented May 2, 2024 via email

@pkufool
Copy link
Collaborator

pkufool commented May 16, 2024

@binhtranmcs Have you figured out the reasons? I think it is because python breaks a large batch into smaller sub batches when doing intersect, see _intersect_device in the python code.

@binhtranmcs
Copy link
Author

@binhtranmcs Have you figured out the reasons? I think it is because python breaks a large batch into smaller sub batches when doing intersect, see _intersect_device in the python code.

tks @pkufool, I will have a look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants