You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 29, 2021. It is now read-only.
I think that other than ergonomics (type-casting and APIs in general), I'm satisfied that a dataframe is possible to build.
What I'd like to focus on as early as possible is a path to lazy evaluation. A key limitation right now is that compute functions are just fn instead of general-purpose kernels. For example, implementing a cast operation is difficult as I need a few different functions to handle different types of casts.
Converting these functions to independent kernel members could allow generating a computation graph with a logical plan. After this, I would be able to leverage (hopefully) existing work that can take my logical plan and turn it into a physical plan.
Having recently worked on a JSON reader for Arrow Rust, I can see a way of supporting pushdown predicates to CSV and JSON data sources. This would be implemented as part of the physical plan to read the data source.
Definition of done
I want to ideally have a rough outline in a Markdown document, so this will take a while to complete.
The text was updated successfully, but these errors were encountered:
The approach that I'm currently taking is to keep functions separate, but to instead create a way of expressing operations on data.
An initial implementation can work on scalar operations, but still needs more effort to complete the effort. I'd also need to create some graph structure that can model dependencies between dataframes.
I think that other than ergonomics (type-casting and APIs in general), I'm satisfied that a dataframe is possible to build.
What I'd like to focus on as early as possible is a path to lazy evaluation. A key limitation right now is that compute functions are just
fn
instead of general-purpose kernels. For example, implementing a cast operation is difficult as I need a few different functions to handle different types of casts.Converting these functions to independent kernel members could allow generating a computation graph with a logical plan. After this, I would be able to leverage (hopefully) existing work that can take my logical plan and turn it into a physical plan.
Having recently worked on a JSON reader for Arrow Rust, I can see a way of supporting pushdown predicates to CSV and JSON data sources. This would be implemented as part of the physical plan to read the data source.
Definition of done
I want to ideally have a rough outline in a Markdown document, so this will take a while to complete.
The text was updated successfully, but these errors were encountered: