Is connnector-x beneficial for complex-query with lot of Joins? #194
-
I have a complex query with many operations on different columns ( eg: I did a benchmark with time-it on the smallest query we've got and the query performance on
We have other queries we want to try this on with Is connector-x only optimised for single-table loading? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @rsampaths16 , thanks for brining up this issue! May I ask what is the destination dataframe you want? Also, may I ask what is the ConnectorX is mainly targeting on the large query result fetching scenario. It speeds up the process by optimizing the client-side execution and saturating both network and machine resource through parallelism. When query gets complex, there will be overhead coming from metadata fetching. In ConnectorX, there are up to three info that will be fetched before issue the query to database:
Let's say if we do not use partition, and we want import connectorx as cx
table = cx.read_sql(db_uri, query, return_type=arrow)
df = table.to_pandas(split_blocks=False, date_as_object=False) For schema fetching query, may I ask which database you are using? For now, we have optimized the procedure in fetching schema info on postgres and mysql. So if you are using these two databases, the performance should be better than other baselines even on complex queries (even with small query result) without partitioning. We are still looking for methods to speed up other databases. |
Beta Was this translation helpful? Give feedback.
Hi @rsampaths16 , thanks for brining up this issue!
May I ask what is the destination dataframe you want? Also, may I ask what is the
raw query
that you refer here? ConnectorX convert the query result into a dataframe for further analysis purpose. So it would be fair if comparing with other tools (e.g. pandas, turbodbc) that also has the same dataframe as result.ConnectorX is mainly targeting on the large query result fetching scenario. It speeds up the process by optimizing the client-side execution and saturating both network and machine resource through parallelism. When query gets complex, there will be overhead coming from metadata fetching. In ConnectorX, there are up to three info…