Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vaex not exporting to file properly inside of a mulitprocessing pool. #2401

Open
ghost opened this issue Nov 2, 2023 · 0 comments
Open

Vaex not exporting to file properly inside of a mulitprocessing pool. #2401

ghost opened this issue Nov 2, 2023 · 0 comments

Comments

@ghost
Copy link

ghost commented Nov 2, 2023

I am having trouble working with vaex inside python's mulitprocessing's pool. The expected behavior for pool.map() is to iterate through out the list supplied to it but that does not seem to be the case when working with vaex's dataFrame type objects. Here the code works but only for the first 16 items, 16 being the number of cores I have on my machine.

So, for code setup as follows:

def export_task(item): # item is a tuple
    subject, outputPathChunk = item # subject is the vaex dataframe and outputPathChunk is the the path
    subject.export_hdf5(outputPathChunk) 

And then

import multiprocessing

pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())

pool.map(export_task,subs)

pool.close()

Where subs is a 600 items list of tuples and each tuple has two items, the first item as is a vaex table and the second is a path.

There is a vaex related warning for the first 16 executions for export_task and I am wondering if that is choking pool.map. That would be a simple issue to work around but doing a simple sample_table.export_hdf5(sample_path) sanity check does not produce the same warning.

The error from vaex is vaex/dataframe.py:2756: UserWarning: The state wants to rename newMass to __newMass, but __newMass was not found, ignoring the rename

vaex-core                 4.14.0           py37hca0595d_0    conda-forge
vaex-hdf5                 0.14.1             pyhd8ed1ab_0    conda-forge

Vaex was installed via: pip / conda-forge / from source mamba-forge

OS: Amazon Linux

@ghost ghost changed the title [BUG-REPORT] Vaex not exporting to file properly inside of a mulitprocessing pool. Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants