-
Notifications
You must be signed in to change notification settings - Fork 67
Support for lazy indexing / lookup? #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @jaschau is this snippet helpful ? import torch
from pykeops.torch import LazyTensor, Genred
device = "cuda" if torch.cuda.is_available() else "cpu"
M, N = 1000, 1001
# A standard LazyTensor
X = LazyTensor(torch.rand(M, 1, 1, device=device))
Y = LazyTensor(torch.rand(1, N, 1, device=device))
DX = (X - Y).square()
# An indices based LazyTensor
I = LazyTensor(torch.arange(M, device=device).type(torch.float32).reshape(M, 1, 1))
J = LazyTensor(torch.arange(N, device=device).type(torch.float32).reshape(1, N, 1))
C_IJ = (I - J).abs()
# reduce the LazyTensors
print((DX * C_IJ).sum(1)) |
Hi @bcharlier, thanks for your message! I don't think the snippet you suggested achieves what Iam looking for. Written in components, your snippet computes What I am looking for is the possibility to compute something like i.e., use the LazyTensors |
Hello @jaschau , import torch
from pykeops.torch import LazyTensor
def fun_torch(A, I, J):
return A[I, J].sum(axis=1)
def fun_keops(A, I, J):
ncol = A.shape[1]
A = LazyTensor(A.flatten())
I = LazyTensor((I + 0.0)[..., None])
J = LazyTensor((J + 0.0)[..., None])
K = A[I * ncol + J]
return K.sum(axis=1).flatten()
P, Q = 12, 5
M, N = 3000, 2000
device = "cuda" if torch.cuda.is_available() else "cpu"
A = torch.randn((P, Q), requires_grad=True, device=device)
I = torch.randint(P, (M, 1), device=device)
J = torch.randint(Q, (1, N), device=device)
res_torch = fun_torch(A, I, J)
print(res_torch)
res_keops = fun_keops(A, I, J)
print(res_keops)
# testing gradients
loss_torch = (res_torch**2).sum()
print(torch.autograd.grad(loss_torch, [A]))
loss_keops = (res_keops**2).sum()
print(torch.autograd.grad(loss_keops, [A])) Let us know if you can test this branch and see if it works as expected. |
Hi @joanglaunes, BTW, is there some documentation on how to add support for new ops with the new python-based keops code? I briefly looked at it prior to raising the issue, but I found it a bit hard to parse without some high-level overview of the main ideas. |
I had a strange observation. I played around with the values of
With
Do you have any insights what might happen here? |
Hello @jaschau , |
Hi @joanglaunes, Edit: for my use-case, I can order the indices in such a way that I'm accessing A very locally, so it would be possible to implement it in a Cache friendly way on the CPU. But I guess for the parallel CUDA programming paradigm, things might be different. And in any case, the local access will not be true for arbitrary sets of indices I, J. |
The simplest thing to try would be to add a special mode for large "parameter" variables, in which these parameters are not loaded into local memory but just kept in the main global memory. I don't know how much it will decrease performance, but it will just work in all cases and it is not very complicated to do. The other way, taking benefit of the fact that A is accessed locally, would be probably the best for efficiency but it is a lot more challenging. |
Going for the simplest solution sounds reasonable to me. Let me know if I can support this in any way! |
Hello @jaschau , import torch
from pykeops.torch import LazyTensor
from time import time
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float64
def fun_keops(A, I, J):
ncol = A.shape[1]
A = LazyTensor(A.flatten())
I = LazyTensor(I.to(dtype)[..., None])
J = LazyTensor(J.to(dtype)[..., None])
K = A[I * ncol + J]
return K.sum(axis=1).flatten()
P, Q = 10000, 10000
M, N = 100000, 100000
A = torch.randn((P, Q), requires_grad=True, device=device, dtype=dtype)
I = torch.randint(P, (M, 1), device=device)
J = torch.randint(Q, (1, N), device=device)
start = time()
res_keops = fun_keops(A, I, J)
end = time()
print("time for keops:", end - start) |
Hi,
first of all I want to say thanks for this great library. I have a use case that corresponds essentially to a lazy lookup: given a matrix A of shape P x Q and integer vectors x, y of length N, M and with values in [0, P) and [0, Q), respectively, I would like to compute the N x M matrix
C[i, j] = A[x[i], y[j]]
lazily since N and M are large numbers, so materializing the matrix is costly memory wise.
I would like to sum C[i, j] with other LazyTensors before eventually performing reductions.
I was wondering if you think this could be in principle supported by this library and if you have any pointers how to possibly implement this.
Thanks a lot for your help!
The text was updated successfully, but these errors were encountered: