Stream large data #65

calebeaires · 2020-08-04T18:47:28Z

Topic open to community of this module

This topic is just to a discussion about the stream flow so we users can understand better the way this plugin handle stream with clickhouse.

Considera a amount os data that takes 10GB. Make a readstream then do a stream is quite easy with the node-clickhouse. About that:

Can I consider that this flow does not supercharge clickhouse database?
Looking at the documentation, whats the diferente between this, does is has diference in terms of performance:

A. Insert with stream

const writableStream = ch.query('INSERT INTO table FORMAT CSV', (err, result) => {})

B. Insert large data (without callback)

const clickhouseStream = ch.query('INSERT INTO table FORMAT TSV')
tsvStream.pipe(clickhouseStream)

I read the Clickhouse docs. This setting make things goes right when well set. How does can I use insert_quorum to make stream write faster considering a single server (without replicas)?
With node-clickhouse WriteStream, do I have to make my code take care of garbage collection so I must make use of pause/resume/drain?

KrishnaPG · 2020-11-07T13:10:59Z

for large files, the streams should also consider failure handling, pause, resume options in case of connection problems or any other network errors. It is not clear if this package handles those checkpoints.

yi · 2020-12-02T17:13:31Z

In real world production, I found it is weak to hold a write stream for large or long time entry insertion. Thus I wrote a wrap based on @apla/node-clickhouse, that supports: 1. failure retry. 2. restore data segments after process crash. and 3. single write process in node cluster mode. Hope that will be helpful:
https://www.npmjs.com/package/clickhouse-cargo

KrishnaPG · 2020-12-02T19:31:08Z

Thank you @yi A quick view of your package looks great. Will try to switch to it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream large data #65

Stream large data #65

calebeaires commented Aug 4, 2020 •

edited

Loading

KrishnaPG commented Nov 7, 2020

yi commented Dec 2, 2020

KrishnaPG commented Dec 2, 2020

Stream large data #65

Stream large data #65

Comments

calebeaires commented Aug 4, 2020 • edited Loading

KrishnaPG commented Nov 7, 2020

yi commented Dec 2, 2020

KrishnaPG commented Dec 2, 2020

calebeaires commented Aug 4, 2020 •

edited

Loading