Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing large dense array gobbles memory #496

Open
chrism0dwk opened this issue Mar 10, 2021 · 2 comments
Open

Writing large dense array gobbles memory #496

chrism0dwk opened this issue Mar 10, 2021 · 2 comments

Comments

@chrism0dwk
Copy link

Hi,

I'm not sure if this is a genuine bug or user error (I'm a TileDB newbie), but here goes:

What happened? I am trying to write a 7GB Numpy array to a TileDB data structure:

import numpy as np
import tiledb
arr_npy = np.random.uniform(size=[10000, 380, 84, 3])
arr_tdb = tiledb.DenseArray.from_numpy("my_array", arr_npy)

Python memory consumption increases to the expected 7GB in line 3, but subsequently consumes all of the available 32GB on the machine when setting up the TileDB array. The on-disc my_array folder does not appear to be populated with data files -- only metadata is laid down.

Expected behaviour Python streams the Numpy array to disc as a TileDB array, with population of the underlying on-disc folder structure, without blowing the RAM.

Version info
Python 3.7.7
Numpy 1.20.1
TileDB 0.8.4
Filesystem BTRFS
OS Linux Mint 20
Hardware Dell XPS 7590 laptop, 32GB RAM.

@ihnorton
Copy link
Member

Hi @chrism0dwk, thanks for opening this issue -- I can reproduce and we are investigating.

@stavrospapadopoulos
Copy link
Member

Thank you @chrism0dwk! It is being addressed here. Apologies for the delay, that required quite some refactoring :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants