-
Notifications
You must be signed in to change notification settings - Fork 62
30x slowdown in regression #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @conjam, first, just in case, have you made sure that you are linking against OpenBLAS or MKL? xtensor-blas contains a C++ implementation (called FLENS) of most BLAS routines, but they are a lot less optimized than actual BLAS. Also if you could give us a hint on what exactly you're doing with xtensor / xtensor-blas we might be able to help better ... One problem could be that we sometimes need to convert row-major matrices to column-major for some LAPACK operations ... that could eat performance. |
First off: thanks for the quick response! I've checked and across platforms (I develop on mac, regression on centos) and libopenblas is linked into both binaries; in case that isn't enough, I have Currently in regression I only use |
Hi @conjam, can you give me some more context on the slowdown, and especially your matrix / vector sizes? You can get some speedup by using xtensor_fixed as a container, however, the BLAS implementation is still "dynamic" and doesn't statically know about the size of your matrices. If you want to achieve the best performance for dot products for small matrices, I would encourage you to write them by hand and use the xtensor_fixed container. If you have a problem with large matrices, I would appreciate it if you could give me more context so I can check what the problem might be. E.g. sizes of the matrices, some code snippets, your hand-written implementation etc. |
xt::lingalg::tensordot seems to execute very slowly here, not sure if this is related. However, I found this is maybe due to preparatory mathematical & view operations I'm doing. I can influence it by using xt::eval. |
Uh oh!
There was an error while loading. Please reload this page.
Hey all,
I've found great success using xtensor (and xtensor-blas); when I'm developing, I've seen ~15x speedups when compared to the handwritten stuff I had prior.
Regression is another story though; when I have jobs that use xtensor-blas, I've seen slowdowns as much as 30x when compared to original performance; this slow down is most prominent in smaller unit tests that would pass in under 500ms, and now take about ~17 seconds. Larger tests (10 second run time plus) had a large slowdown of 5x-10x.
I suspect that the problem lies in OpenBLAS as a backend, and I have tried to limit the number of threads spawned by setting
OPENBLAS_NUM_THREADS=1
, and limiting the number of threads did help, as before I did that my system would crash during regression with pthread resource errors.Before I spend cycles profiling too deeply, I figured I'd ask: has anyone seen anything similar to this ?
The text was updated successfully, but these errors were encountered: