-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Want to support library export as Polars. #1096
Comments
I put a "thumbs up" on this. In the context of https://github.com/intake/awkward-pandas, we've been thinking about Polars, too. (Extending awkward-pandas would be a necessary step to get ragged arrays in Polars, though in principle it could be added to Uproot for flat arrays now.) Incidentally, all of the Pandas conversion happens in one file: uproot5/src/uproot/interpretation/library.py Lines 746 to 923 in e592ae3
Some of the preparation steps are not Pandas-specific and can be reused in a new library (fourth after Actually, now that I think of it, Polars columns are in Apache Arrow format. Maybe we could use ak.to_arrow or ak.to_arrow_table instead of expanding awkward-pandas. @Esword618, do you know enough about getting data into Polars to know if there's an easy way to do it with a pyarrow array or Table? |
I'm also new to Polars and not very familiar with it, but I'm willing to learn about it and try to add this feature to uproot. |
Okay, thanks! The first question that could make short work of this is to see if Polars has any constructor that turns a pyarrow array or a pyarrow Table into a DataFrame. If this is true, then there would be almost no work on our side. Here's a way to make a pyarrow array or Table (other than using pyarrow's own constructors; I think Awkward Arrays are easier): >>> import awkward as ak
>>> ak_array = ak.Array([
... {"col1": 1.1, "col2": [1]},
... {"col1": 2.2, "col2": [1, 2]},
... {"col1": 3.3, "col2": [1, 2, 3]},
... ])
>>> ak.to_arrow(ak_array)
<awkward._connect.pyarrow.AwkwardArrowArray object at 0x738fb207b880>
-- is_valid: all not null
-- child 0 type: extension<awkward<AwkwardArrowType>>
[
1.1,
2.2,
3.3
]
-- child 1 type: extension<awkward<AwkwardArrowType>>
[
[
1
],
[
1,
2
],
[
1,
2,
3
]
]
>>> ak.to_arrow_table(ak_array)
pyarrow.Table
col1: extension<awkward<AwkwardArrowType>> not null
col2: extension<awkward<AwkwardArrowType>> not null
----
col1: [[1.1,2.2,3.3]]
col2: [[[1],[1,2],[1,2,3]]] I'd expect pyarrow array to be something like a Series and a pyarrow Table to be something like a DataFrame. Arrow makes a distinction between records with named fields in an array and the top-level fields of a Table. You might try different >>> ak_array = ak.Array([1.1, 2.2, 3.3]) and more complex ones like >>> ak_array = ak.Array([1.1, 2.2, 3.3, [1, 2, 3, None]]) |
Dear @jpivarski: |
That's great! There's an inefficiency in that pathway, though: those NumPy arrays have Is it possible to do this? import awkward as ak
import polars as pl
dict_of_awkward_arrays = mu_tree.arrays(..., library="ak", how=dict)
dict_of_arrow_arrays = {k: ak.to_arrow(v, extensionarray=False) for k, v in dict_of_awkward_arrays.items()}
list_of_polars_series = [pl.Series(k, v) for k, v in dict_of_arrow_arrays.items()]
polars_df = pl.DataFrame(list_of_polars_series)
polars_df Or this? import awkward as ak
import pyarrow as pa
import polars as pl
awkward_array = mu_tree.arrays(..., library="ak")
arrow_table = ak.to_arrow_table(awkward_array, extensionarray=False)
polars_df = pl.DataFrame(arrow_table)
polars_df (Replace Looking at the Polars documentation, pl.Series allows pyarrow arrays ( (The |
Hi all, Did this issue come to a conclusion? I more or less reinvented the proposed pipeline to polars though arrow (with extensionarray=False) and I can get e.g. the "list[i64]" type, so the ragged data are preserved. |
This issue didn't come to a conclusion: there isn't a built-in Uproot backend for Polars (e.g. |
Gotcha. It would probably be good to give the recipe somewhere (as it took me a moment to figure out the needed extra argument), but beyond that the recipe is quite straightforward. |
Want to support library export as Polars.
Polars
The text was updated successfully, but these errors were encountered: