30x slowdown in regression #146

conjam · 2019-12-06T17:39:15Z

Hey all,

I've found great success using xtensor (and xtensor-blas); when I'm developing, I've seen ~15x speedups when compared to the handwritten stuff I had prior.

Regression is another story though; when I have jobs that use xtensor-blas, I've seen slowdowns as much as 30x when compared to original performance; this slow down is most prominent in smaller unit tests that would pass in under 500ms, and now take about ~17 seconds. Larger tests (10 second run time plus) had a large slowdown of 5x-10x.

I suspect that the problem lies in OpenBLAS as a backend, and I have tried to limit the number of threads spawned by setting OPENBLAS_NUM_THREADS=1, and limiting the number of threads did help, as before I did that my system would crash during regression with pthread resource errors.

Before I spend cycles profiling too deeply, I figured I'd ask: has anyone seen anything similar to this ?

The text was updated successfully, but these errors were encountered:

wolfv · 2019-12-06T17:59:28Z

Hi @conjam, first, just in case, have you made sure that you are linking against OpenBLAS or MKL? xtensor-blas contains a C++ implementation (called FLENS) of most BLAS routines, but they are a lot less optimized than actual BLAS.

Also if you could give us a hint on what exactly you're doing with xtensor / xtensor-blas we might be able to help better ... One problem could be that we sometimes need to convert row-major matrices to column-major for some LAPACK operations ... that could eat performance.

conjam · 2019-12-06T21:14:02Z

First off: thanks for the quick response!

I've checked and across platforms (I develop on mac, regression on centos) and libopenblas is linked into both binaries; in case that isn't enough, I have add_definitions(-DHAVE_CBLAS=1) and set(XTENSOR_USE_XSIMD 1) in my CMakeLists (I followed the CMake guide y'all put out verbatim).

Currently in regression I only use xt::linalg::dot to find the matrix product of 2D vectors.

wolfv · 2019-12-18T17:53:32Z

Hi @conjam,

can you give me some more context on the slowdown, and especially your matrix / vector sizes?
If you have small matrices, it's very possible that hand-written code outperforms BLAS (e.g. for 3x3 matrix-matrix or matrix-vector product).

You can get some speedup by using xtensor_fixed as a container, however, the BLAS implementation is still "dynamic" and doesn't statically know about the size of your matrices.

If you want to achieve the best performance for dot products for small matrices, I would encourage you to write them by hand and use the xtensor_fixed container.

If you have a problem with large matrices, I would appreciate it if you could give me more context so I can check what the problem might be. E.g. sizes of the matrices, some code snippets, your hand-written implementation etc.

pdumon · 2020-03-13T14:07:05Z

xt::lingalg::tensordot seems to execute very slowly here, not sure if this is related. However, I found this is maybe due to preparatory mathematical & view operations I'm doing. I can influence it by using xt::eval.
Nevertheless, I have two identical algorithms in python-numpy and in C++ (using xtensor-blas) and the C++ version is 50-100x slower than the python-numpy version. The result of the calculation is identical.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

30x slowdown in regression #146

30x slowdown in regression #146

conjam commented Dec 6, 2019 •

edited

Loading

wolfv commented Dec 6, 2019

Uh oh!

conjam commented Dec 6, 2019 •

edited

Loading

Uh oh!

wolfv commented Dec 18, 2019

Uh oh!

pdumon commented Mar 13, 2020

Uh oh!

30x slowdown in regression #146

30x slowdown in regression #146

Comments

conjam commented Dec 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

wolfv commented Dec 6, 2019

Uh oh!

conjam commented Dec 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfv commented Dec 18, 2019

Uh oh!

pdumon commented Mar 13, 2020

Uh oh!

conjam commented Dec 6, 2019 •

edited

Loading

conjam commented Dec 6, 2019 •

edited

Loading