Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should vizro support polars (or other dataframes besides pandas)? #286

Open
antonymilne opened this issue Jan 25, 2024 · 3 comments
Open

Comments

@antonymilne
Copy link
Contributor

antonymilne commented Jan 25, 2024

Ty Petar, please consider supporting polars, I think it is necessary, given that the whole point of vizro is working with a dataframe in memory. Currently vizro cannot determine polars column names (detects them as [0,1,2,3,4...])

Originally posted by @vmisusu in #191 (comment)


I'm opening this issue to see whether other people have the same question so we can figure out what priority it should be. Just hit 👍 if it's something you'd like to see in vizro and feel free to leave and comments.

The current situation (25 January 2024) is:

  • vizro currently only supports pandas DataFrames, but supporting others like polars a great idea and something we did consider before. The main blocker previously was that plotly didn't support polars, but as of 5.15 it supports not just polars but actually any dataframe with a to_pandas method, and as of 5.16 it supports dataframes that follow the dataframe interchange protocol (which is now pip installable)
  • on vizro we could follow a similar sort of pattern to plotly's development1. Ideally supporting the dataframe interchange protocol is the "right" way to do this, but we should work out exactly how much performance improvement polars users would actually get in practice to see what the value of this would be over a simple to_pandas call. The biggest changes we'd need to make would be to actions code like filtering functionality (FYI @petar-qb). I don't think it would be too hard, but it's certainly not a small task either

FYI @astrojuanlu

Footnotes

  1. https://github.com/plotly/plotly.py/pull/4244 https://github.com/plotly/plotly.py/pull/4272/files https://github.com/plotly/plotly.py/pull/3901 https://github.com/plotly/plotly.py/issues/3637

@antonymilne antonymilne changed the title Should we support polars? Should vizro support polars (or other dataframes besides pandas)? Jan 25, 2024
@datajoely
Copy link

Maybe Ibis is a good fit here?

@astrojuanlu
Copy link

I only reacted with 🚀 to this, but to make my position more clear,

as of 5.15 it supports not just polars but actually any dataframe with a to_pandas method, and as of 5.16 it supports dataframes that follow the dataframe interchange protocol (data-apis/dataframe-api#73)

this is awesome ⭐

but we should work out exactly how much performance improvement polars users would actually get in practice to see what the value of this would be over a simple to_pandas call.

I think it's more of a DX experience, not necessarily performance improvement. If folks are using Polars for whatever reason and then they have to do .to_pandas() to use Vizro, it feels a bit meh. If Vizro supports Polars natively, it's more pleasant.

@astrojuanlu
Copy link

Just seen on their LinkedIn:

Check migrated all 100+ of their Airflow DAGs from pandas to Polars and saved 25% in cloud expenses.

https://pola.rs/posts/case-check-technology/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants