Skip to content

Feature request: symbolic random matrices #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JonasDeSchouwer opened this issue May 28, 2024 · 2 comments
Open

Feature request: symbolic random matrices #372

JonasDeSchouwer opened this issue May 28, 2024 · 2 comments

Comments

@JonasDeSchouwer
Copy link

Hi, it would be very useful to have large random matrices that can be stored symbolically, e.g. with a LazyTensor equivalent of torch.rand().

My use case:
I am working with a variant of the K-nearest neighbours algorithm. I have $N$ nodes, for each of which I want to sample $K$ neighbours (without replacement). For node $i$, the probabilities of selecting each neighbour $j$ are given by an unnormalized probability vector $\vec{p}^T$. These vectors are saved in a symbolic matrix $P = \left[ \vec{p_1} \ \cdots \ \vec{p_N} \right] ^T$.

To sample without replacement, I want to use the Gumbel top-k trick, as in this paper. In short, how this works is: for each $i$, sample a vector $\vec{q}\in[0,1]^N$ uniformly. Then take the indices of the $K$ smallest elements of $\log(\vec{p_i}) -\log(-\log(\vec{q}))$.

How I would do this in the dense case:

# P: BxNxN probability matrix
Q = torch.rand_like(P)
temp = torch.log(P) - torch.log(-torch.log(Q))
result = torch.topk(temp, K)    # note that torch.topk() works rowwise

Note, however, that $N$ is potentially huge (~10K-10M), so naturally I want to use symbolic matrices for both $P$ and $Q$.

How I would like to do this with symbolic matrices:

# P: BxNxN symbolic probability matrix
Q = pykeops.random.uniform(P.size())    # symbolic matrix
temp = torch.log(P) - torch.log(-torch.log(Q))    # still symbolic
result = temp.argKmin(K, dim=1)    # reduction returns a normal tensor containing NxK indices

I am quite new to this repository, so it is possible that I am overlooking a feature / workaround. If that is the case, some pointers would be appreciated!

@MH-limarco
Copy link

I have the same problem. I added the normal distribution generator c_random to keopscore/utils/code_gen_utils.py, but I am not good enough for developing C++ environment. I have no idea how to solve this problem.

from ctypes import CDLL
import math
libc = CDLL("libc.so.6")

class c_random(c_variable):
    # class to represent a C++ variable, storing its c++ name and its C++ type.
    def __new__(self, dtype, list_string_id=None):
        if isinstance(list_string_id, list):
            return list(c_random(dtype) for _ in list_string_id)
        else:
            return super(c_variable, self).__new__(self)

    def __init__(self, dtype, string_id=None):
        mu, std = (libc.rand()%100)/100, (libc.rand()%100)/100

        self.dtype = dtype if dtype != "int" else "float"
        self.id = str(math.sqrt(-2 * math.log(std)) * math.cos(2 * math.pi * mu))

@dhbloo
Copy link

dhbloo commented Nov 19, 2024

I met the same problem when I wanted to sample a 2D Gumbel noise with large dimensions (N, M) where N,M=65536. I have found a simple workaround for those who want to generate a symbolic random uniform array:

def uniform_lazy_tensor_2d(dim0, dim1, device) -> LazyTensor:
    """Initialize a 2D lazy tensor with uniform random values."""
    rand_x = LazyTensor(torch.rand((dim0, 1, 1), device=device))
    rand_y = LazyTensor(torch.rand((1, dim1, 1), device=device))

    rand_xy = (rand_x * 12.9898 + rand_y * 78.233).sin() * 43758.5453123
    rand_xy_floor = (rand_xy - 0.5).round()
    rand_xy_fract = rand_xy - rand_xy_floor
    rand_xy_clamp = rand_xy_fract.clamp(0, 1)

    return rand_xy_clamp

It composes two random 1D dense arrays using hash-like computation commonly used in GPU shaders. I mimic the behavior of fract() using round() here, which may lose some precision but I guess it should work well in most cases.

Test code:

U = uniform_lazy_tensor_2d(65536, 65536, 'cuda')
mean = U.sum(0).sum() / 65536**2
variance = ((U - mean)**2).sum(0).sum() / 65536**2
print(f"mean: {mean}, var: {variance}")  # mean should equal 1/2 and var should equal 1/12

Output:

mean: 0.49997401237487793, var: 0.08322032541036606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants