-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup by removing loops from Chebyshev polynomail computation #10
Comments
I agree that this would most likely boost the performance quite a bit. However, I am not sure what would happen to the memory usage. If I understand you correctly then the I am not entirely sure however, so I guess one could benchmark it. |
I think the even bigger problem is that this goes directly against the recent pull request https://github.com/deepsphere/deepsphere-cosmo-tf2/pull/8: for the ~5000 square degrees and multiple input channels of DES, I ran into the int32 addressing limitation of It's a pity that |
Hey Janis! What I have in mind is that the block sparse Laplacian would be a matrix containing N blocks on the diagonal, each corresponding to the Laplacian for the polynomial with order 1..N. We would build it with the explicit (not recursive) definition of Chebyshev polynomials (https://en.wikipedia.org/wiki/Chebyshev_polynomials#First_kind). So the total number of elements for this matix would be just N * count_nonzero(L). The sparse-dense matmul would be launched just once. Where do you expect the multiplication of Laplacian with itself? Is it in subsequent conv layers? Or am I missing something.. Arne: The 2^31 is indeed the problem, in that regime the block-sparse-L indeed won't help. But I think downstream in the network the feature lengths are much smaller, which is where the speed-up may come. |
It's been a while, but as far as I understand it, the Chebyshev polynomials are evaluated with the graph Laplacian and then applied on the graph, so you would evaluate If the graph does not have too many nodes, it would probably be feasible even if the blocks are dense, so one could implement it and only trigger it for certain resolutions. I also believe that sparse matrices with dense blocks have now optimized instructions and should be even more efficient than a sparse dense multiplication. |
The polynomial values are computed in a loop:
deepsphere-cosmo-tf2/deepsphere/gnn_layers.py
Line 127 in 3facedf
For commonly used degree of 5, that requires 5 consecutive steps. It may be faster to execute this using a single sparse-dense-matmul.
That would require building a sparse laplacian for each polynomial component, which can be pre-computed.
The single sparse-dense-matmul can be launched once with input of
sparse-dense-matmul( block sparse-L, tile input )
followed by a reshape and a sum.
For N=polynomial degree and M=map length and ignoring channels and batch dimensions, the matrix multiplication could look like this
which gives N feature maps.
Any chance that would make sense?
Also tagging @Arne-Thomsen
The text was updated successfully, but these errors were encountered: