Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream large data #65

Open
calebeaires opened this issue Aug 4, 2020 · 3 comments
Open

Stream large data #65

calebeaires opened this issue Aug 4, 2020 · 3 comments

Comments

@calebeaires
Copy link

calebeaires commented Aug 4, 2020

Topic open to community of this module

This topic is just to a discussion about the stream flow so we users can understand better the way this plugin handle stream with clickhouse.

Considera a amount os data that takes 10GB. Make a readstream then do a stream is quite easy with the node-clickhouse. About that:

  1. Can I consider that this flow does not supercharge clickhouse database?
  2. Looking at the documentation, whats the diferente between this, does is has diference in terms of performance:

A. Insert with stream

const writableStream = ch.query('INSERT INTO table FORMAT CSV', (err, result) => {})

B. Insert large data (without callback)

const clickhouseStream = ch.query('INSERT INTO table FORMAT TSV')
tsvStream.pipe(clickhouseStream)
  1. I read the Clickhouse docs. This setting make things goes right when well set. How does can I use insert_quorum to make stream write faster considering a single server (without replicas)?

  2. With node-clickhouse WriteStream, do I have to make my code take care of garbage collection so I must make use of pause/resume/drain?

@KrishnaPG
Copy link

for large files, the streams should also consider failure handling, pause, resume options in case of connection problems or any other network errors. It is not clear if this package handles those checkpoints.

@yi
Copy link

yi commented Dec 2, 2020

In real world production, I found it is weak to hold a write stream for large or long time entry insertion. Thus I wrote a wrap based on @apla/node-clickhouse, that supports: 1. failure retry. 2. restore data segments after process crash. and 3. single write process in node cluster mode. Hope that will be helpful:
https://www.npmjs.com/package/clickhouse-cargo

@KrishnaPG
Copy link

Thank you @yi A quick view of your package looks great. Will try to switch to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants