Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Data support #68

Open
kdkavanagh opened this issue Aug 27, 2020 · 3 comments
Open

External Data support #68

kdkavanagh opened this issue Aug 27, 2020 · 3 comments

Comments

@kdkavanagh
Copy link

Apologies if I missed this somewhere in the docs... Are there any plans to support Clickhouse's External Data API, perhaps accepting a data.frame as input to the query?

@tridelt
Copy link
Collaborator

tridelt commented Aug 27, 2020

Hello!

thanks for the issue. Currently there are no concrete plans for the very near future.
I would love to learn more about your use-case. May I ask what advantages such implementation would have for you over the following input-method which is currently possible in RClickhouse?

library(RClickhouse)
library(DBI)

con <- dbConnect(clickhouse(), port=9000)

dataFrame <- data.frame(
  "Col1"=c("b","b"),
  "Col2"=1:2
)
dbWriteTable(con, "dataFrameTable", dataFrame)

Yours
Tridelt

@kdkavanagh
Copy link
Author

The workaround you suggest would work, though pushes the management of that ephemeral table onto the user, which I suspect would be prone to mistakes.

Main usecase is that I often have a set of identifiers in R for which I would want to join against some data living in a clickhouse table. Right now, I would either need to convert those identifiers to a (very long) WHERE id in ({x}) string for the query, or pull all the data from Clickhouse into R and do the join/lookup/merge in R which is likely to exceed reasonable memory limits for large clickhouse tables.

One of the big python Clickhouse drivers has implemented support for the external data API: https://clickhouse-driver.readthedocs.io/en/latest/features.html#external-data-for-query-processing

@inkrement
Copy link
Member

Hi! Thanks for this nice suggestion! The external data API seems indeed really interesting and there are for sure plenty of use cases. However, this package is basically a wrapper around the official clickhouse c++ client plus some dplyr gimmicks. As far as I know, this feature is not supported by the cpp client yet and therefore we have to add it there first. We'll discuss it internally the next days and reach out to the cpp-client fellas. Please don't expect it to happen within the next weeks, but we'll keep this thread open and use it for updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants