You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to write and read a ton randomly distributed values to a large sparse array. Writing to a TileDB sparse array is awesome; it's easy and super fast. However, I have not found a way to read the back.
My use-case involves updating the values of the sparse array by summing it with another sparse array in COO format. Using multi_index results in uncaught errors. Here's an example:
importtiledbastimportnumpyasnpimportrandomd1=t.Dim(name="d1", domain=(1, 3_000_000), dtype=np.int64, tile=3000)
d2=t.Dim(name="d2", domain=(1, 3_000_000), dtype=np.int64, tile=3000)
schema=t.ArraySchema(domain=t.Domain(d1, d2), sparse=True, attrs=[t.Attr(name="a", dtype=np.int64)])
t.SparseArray.create("test_sparse", schema)
# Just for testingrow=random.choices(range(1, 3_000_000), k=10_000_000)
col=random.choices(range(1, 3_000_000), k=10_000_000)
row, col=tuple(zip(*set(zip(row, col))))
cnt=random.choices(range(1, 3_000_000), k=len(row))
row, col=np.array(row), np.array(col)
cnt=np.array(cnt)
# Write the data to the array (super fast, awesome!)witht.SparseArray("test_sparse", "w") asA:
A[row, col] =cnt# Read the data just writtenwitht.SparseArray("test_sparse", "r") asA:
data=A.multi_index[row.tolist(), col.tolist()]
print(data["a"])
This may still be considered a bug, but I clearly misunderstood the functionality of multi_index. I thought it allowed for coordinate selection, similar to vindex or get_coordinate_selection in Zarr. The query I'm submitting in the code above is massive (I'm guessing it's 10,000,000^2) and cannot be allocated since it would require hundreds of TB.
Is there any way to access the elements of a SparseArray through coordinates, similar to when writing values? Right now I have resorted to splitting up my selections into multi_index queries, one for each row.
I'm trying to write and read a ton randomly distributed values to a large sparse array. Writing to a TileDB sparse array is awesome; it's easy and super fast. However, I have not found a way to read the back.
My use-case involves updating the values of the sparse array by summing it with another sparse array in COO format. Using
multi_index
results in uncaught errors. Here's an example:The above
multi_index
call results in:The text was updated successfully, but these errors were encountered: