Helm questions #220

BorisTyshkevich · 2025-02-18T11:08:57Z

I see such in docs:

SNAPSHOT_ONLY: A one-time Job for uploading a snapshot of data.
INCREMENT_ONLY: A continuous data replication StatefulSet.
SNAPSHOT_AND_INCREMENT: A Job for a one-time data snapshot upload followed by a continuous replication StatefulSet after the job completes.

Question to the Helm Chart maintainer:

Why StatefulSet is used? Is there any state for trcli? Where is it stored?
How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why trcli replicate will wait trcli activate ? How do they sync?

BTW:
I'm working on a Helm chart that can run multiple Transfer tasks in different Deployments—and thinking of publishing it or incorporating it into the existing one.

values.yaml looks like:

transfers:
  - name: task1
    type: SNAPSHOT_AND_INCREMENT       # Options: SNAPSHOT_ONLY, INCREMENT_ONLY, SNAPSHOT_AND_INCREMENT (default)
    src:  db1                                                       # ref to db-hosts and Secrets
    dst:  clickhouse                                          # ref to db-hosts and Secrets
    coordinator: coordinator_s3                     # ref to coordinator_s3

  - name: task2
    src: db1
    dst:  clickhouse
    coordinator: coordinator_s3

  - name: task3
    src: db2
    dst:  clickhouse

default src/dst hosts settings for different tasks are placed in the db-hosts directory. Secrets are used via ENV variables.

Particular transfer.yaml is merged from:

config/task1.yaml (mostly table and database names configured here)
db-hosts/db1.yaml
db-hosts/clickhouse.yaml

The text was updated successfully, but these errors were encountered:

laskoviymishka · 2025-02-20T10:34:33Z

Why is StatefulSet used? Does trcli have any state? Where is it stored?

Partially. Transfer is an MPP system that relies on jobs knowing their position within a pool of jobs, and this position must remain constant. In many cases, a transfer job needs to be aware of its index—typically, the job at index 0 performs coordination tasks, while all other jobs handle secondary work.

How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why does trcli replication wait for trcli activation? How do they sync?

The communication occurs via state stored in S3. The replication job checks whether the state already exists—if it does, the snapshot step is skipped. This mechanism is the same as how restarts are handled.

laskoviymishka · 2025-02-20T10:35:06Z

I'm working on a Helm chart that can run multiple Transfer tasks in different Deployments—and thinking of publishing it or incorporating it into the existing one.

If you can incorporate it into exist one it would be really cool <3

BorisTyshkevich · 2025-02-20T10:55:15Z

Partially

It is still not clear why StatefulSet is used in Helm Chart. If the state is stored in s3 and no state files are persisted on file system, the Deployment would be enough.

The communication occurs via state stored in S3.

Does this mean that even for the Postgres source (which store position is in LSN), we need to set up an S3 coordinator?

How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why does trcli replication wait for trcli activation? How do they sync?

The communication occurs via state stored in S3. The replication job checks whether the state already exists—if it does, the snapshot step is skipped.

It's clear. The question was not about trcli itself, but about Helm Chart, where some complication is created with Job and StatefulSet in a sequence for the same task.

laskoviymishka · 2025-02-20T11:10:51Z

It is still not clear why StatefulSet is used in Helm Chart. If the state is stored in s3 and no state files are persisted on file system, the Deployment would be enough.

A Deployment does not provide sequential indexing for jobs out of the box. If there is a way to assign indexes from 0 to N to deployment units, then it could be moved to a Deployment.

Does this mean that even for the Postgres source (which store position is in LSN), we need to set up an S3 coordinator?

For sharded reads, yes. This is necessary to share certain information (such as the transaction for the read snapshot) between jobs.

BorisTyshkevich · 2025-02-20T13:35:50Z

How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why does trcli replication wait for trcli activation? How do they sync?

The communication occurs via state stored in S3. The replication job checks whether the state already exists—if it does, the snapshot step is skipped. This mechanism is the same as how restarts are handled.

As I understand, it's a general rule for trcli itself. However, the question is about Helm Chart:

{{- if eq .Values.transferSpec.type "SNAPSHOT_AND_INCREMENT" }}
{{ include "snapshot-job" . }}
---
{{ include "snapshot-regular-cronjob" . }}
---
{{ include "replication-statefulset" . }}

Instead of creating a single StatefulSet and letting him do all the snapshotting and subsequent replication, all three objects - Job/CronJob/StatefulSet - were created here. What is the idea?

laskoviymishka · 2025-02-20T14:13:36Z

Afaik inside template there is an if that skip generation of anything if transfer type is not the one is needed

BorisTyshkevich · 2025-02-20T14:35:39Z

there are no ifs:

Seems that there was some idea, but I don't understand it.

laskoviymishka · 2025-02-21T11:48:40Z

Honestly - there is no very big design thinking behind this helm chart :D
So if you will add if-s there - would be nice <3

BorisTyshkevich · 2025-02-21T15:18:28Z

As for now, I don't see any reason to start many pods (f.e. by StatefulSet and Job) for a single transfer job, so all such stuff looks like a non-needed complication and has to be removed. Maybe I'm wrong; that is why I've opened this issue.

I probably need to discuss that with the developer of Helm Chart - who and how uses that, so I can understand requirements and reintroduce those features in my chart.

Another reason for running several pods per transfer job could be #223, but it's a long way to go.

laskoviymishka · 2025-02-21T17:05:47Z

The whole idea of this job-s is to enable horizontal scaling for transfer.
Transfer can scale horizontally by adding more pods in snapshots, and for some sources (kafka / kinesis / s3) - in replication.
The only requirement for those scaled pods - to know how many workers are up (total amount) and what is current index of current worker.

BorisTyshkevich · 2025-02-21T17:35:09Z

OK. That is clear.

Do you also look at pod suffixes like app.kubernetes.io/name: {{ .Release.Name }}-replication (or snapshot) to understand that they are doing a shared work on the same source?

And how is the work sharded? Is it really possible to split the work for a generic mysql/postgres table without any partitioning and PK index?

laskoviymishka · 2025-02-21T17:47:59Z

Nope, code only search for indexes, suffixes are for humans.

And how is the work sharded? Is it really possible to split the work for a generic mysql/postgres table without any partitioning and PK index?

Worker with index 0 (initializer) analyzes the database schema and determines partitioning strategies:
- Uses table statistics.
- Splits data by PK ranges if possible.
- Otherwise, applies heuristics (e.g., partitioning by id % N).
Worker 0 stores the list of segments (value ranges) in the coordinator (e.g., S3).
Workers 1..N request available segments from the coordinator, process them, and report completion.

sequenceDiagram
    participant Worker0 as Worker 0 (Main)
    participant Coordinator as Coordinator (S3)
    participant Workers as Workers (Secondary)
    participant DB as MySQL/PostgreSQL

    Worker0 ->> DB: Analyze schema
    Worker0 ->> Coordinator: Store data split information
    loop For each Worker
        Workers ->> Coordinator: Request segment
        Workers ->> DB: Process segment
        Workers ->> Coordinator: Report completion
    end
    Coordinator ->> Worker0: Wait All Completed

BorisTyshkevich · 2025-02-22T16:40:53Z

Thank you for the explanation.

Here is my Helm Chart with multiple transfer tasks, adopted for Altinity Cloud - https://github.com/bvt123/transfer/tree/main/helm

laskoviymishka · 2025-02-22T23:27:48Z

This helm - is looks legit, looks much nicer and cleaner, maybe it worth to make a PR with it?

bvt123 · 2025-02-23T09:55:50Z

Created a PR. Please approve if you like it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm questions #220

Helm questions #220

BorisTyshkevich commented Feb 18, 2025

laskoviymishka commented Feb 20, 2025

laskoviymishka commented Feb 20, 2025

BorisTyshkevich commented Feb 20, 2025

laskoviymishka commented Feb 20, 2025

BorisTyshkevich commented Feb 20, 2025

laskoviymishka commented Feb 20, 2025

BorisTyshkevich commented Feb 20, 2025

laskoviymishka commented Feb 21, 2025

BorisTyshkevich commented Feb 21, 2025

laskoviymishka commented Feb 21, 2025

BorisTyshkevich commented Feb 21, 2025

laskoviymishka commented Feb 21, 2025 •

edited

Loading

BorisTyshkevich commented Feb 22, 2025

laskoviymishka commented Feb 22, 2025

bvt123 commented Feb 23, 2025

Helm questions #220

Helm questions #220

Comments

BorisTyshkevich commented Feb 18, 2025

laskoviymishka commented Feb 20, 2025

laskoviymishka commented Feb 20, 2025

BorisTyshkevich commented Feb 20, 2025

laskoviymishka commented Feb 20, 2025

BorisTyshkevich commented Feb 20, 2025

laskoviymishka commented Feb 20, 2025

BorisTyshkevich commented Feb 20, 2025

laskoviymishka commented Feb 21, 2025

BorisTyshkevich commented Feb 21, 2025

laskoviymishka commented Feb 21, 2025

BorisTyshkevich commented Feb 21, 2025

laskoviymishka commented Feb 21, 2025 • edited Loading

BorisTyshkevich commented Feb 22, 2025

laskoviymishka commented Feb 22, 2025

bvt123 commented Feb 23, 2025

laskoviymishka commented Feb 21, 2025 •

edited

Loading