Replies: 1 comment
-
I looked at Apache Arrow during my time at Neo4j, and I think this is quite doable. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Greetings. Please for renewed consideration, #2556 could we evaluate using Apache Arrow as the bridge between the JVM and Python process in the CPython transform?
More broadly there are probably connectivity efficiency gains to be had in connecting to more modern data sources that support ADBC
https://arrow.apache.org/docs/format/ADBC.html
Feb 2024 - Snowflake, BigQuery, Postgres, SQLite, and Pandas (think CPython 30x to 80x faster) transporting data using Arrow.
https://voltrondata.com/blog/go-inside-the-arrow-database-connectivity-roadmap-background-and-community?utm_source=chatgpt.com
Perhaps just border to border level changes might make it easy to take incremental steps towards Arrow. Being a pragmatist, I would try to prioritize the enhancement for the Cpython step to get rid of the "Server" process it spins up that can die / hang if any data is not escaped in the dataframe or variables passed between Hop and the Python process running outside the JVM. This is a matter of stability, not just data transport. A company that will not be named used py4j that came out before Arrow. It uses sockets and ports to facilitate transfers between JVM / Python Processes, but I feel like Arrow will be more portable and is not such an outlier project like VFS etc. I sense greater longevity and process with Apache Arrow.
Beta Was this translation helpful? Give feedback.
All reactions