Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm questions #220

Open
BorisTyshkevich opened this issue Feb 18, 2025 · 15 comments
Open

Helm questions #220

BorisTyshkevich opened this issue Feb 18, 2025 · 15 comments

Comments

@BorisTyshkevich
Copy link

I see such in docs:

  1. SNAPSHOT_ONLY: A one-time Job for uploading a snapshot of data.
  2. INCREMENT_ONLY: A continuous data replication StatefulSet.
  3. SNAPSHOT_AND_INCREMENT: A Job for a one-time data snapshot upload followed by a continuous replication StatefulSet after the job completes.

Question to the Helm Chart maintainer:

  1. Why StatefulSet is used? Is there any state for trcli? Where is it stored?
  2. How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why trcli replicate will wait trcli activate ? How do they sync?

BTW:
I'm working on a Helm chart that can run multiple Transfer tasks in different Deployments—and thinking of publishing it or incorporating it into the existing one.

values.yaml looks like:

transfers:
  - name: task1
    type: SNAPSHOT_AND_INCREMENT       # Options: SNAPSHOT_ONLY, INCREMENT_ONLY, SNAPSHOT_AND_INCREMENT (default)
    src:  db1                                                       # ref to db-hosts and Secrets
    dst:  clickhouse                                          # ref to db-hosts and Secrets
    coordinator: coordinator_s3                     # ref to coordinator_s3

  - name: task2
    src: db1
    dst:  clickhouse
    coordinator: coordinator_s3

  - name: task3
    src: db2
    dst:  clickhouse

default src/dst hosts settings for different tasks are placed in the db-hosts directory. Secrets are used via ENV variables.

Particular transfer.yaml is merged from:

  • config/task1.yaml (mostly table and database names configured here)
  • db-hosts/db1.yaml
  • db-hosts/clickhouse.yaml
@laskoviymishka
Copy link
Contributor

Why is StatefulSet used? Does trcli have any state? Where is it stored?

Partially. Transfer is an MPP system that relies on jobs knowing their position within a pool of jobs, and this position must remain constant. In many cases, a transfer job needs to be aware of its index—typically, the job at index 0 performs coordination tasks, while all other jobs handle secondary work.

How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why does trcli replication wait for trcli activation? How do they sync?

The communication occurs via state stored in S3. The replication job checks whether the state already exists—if it does, the snapshot step is skipped. This mechanism is the same as how restarts are handled.

@laskoviymishka
Copy link
Contributor

I'm working on a Helm chart that can run multiple Transfer tasks in different Deployments—and thinking of publishing it or incorporating it into the existing one.

If you can incorporate it into exist one it would be really cool <3

@BorisTyshkevich
Copy link
Author

Partially

It is still not clear why StatefulSet is used in Helm Chart. If the state is stored in s3 and no state files are persisted on file system, the Deployment would be enough.

The communication occurs via state stored in S3.

Does this mean that even for the Postgres source (which store position is in LSN), we need to set up an S3 coordinator?

How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why does trcli replication wait for trcli activation? How do they sync?

The communication occurs via state stored in S3. The replication job checks whether the state already exists—if it does, the snapshot step is skipped.

It's clear. The question was not about trcli itself, but about Helm Chart, where some complication is created with Job and StatefulSet in a sequence for the same task.

@laskoviymishka
Copy link
Contributor

It is still not clear why StatefulSet is used in Helm Chart. If the state is stored in s3 and no state files are persisted on file system, the Deployment would be enough.

A Deployment does not provide sequential indexing for jobs out of the box. If there is a way to assign indexes from 0 to N to deployment units, then it could be moved to a Deployment.

Does this mean that even for the Postgres source (which store position is in LSN), we need to set up an S3 coordinator?

For sharded reads, yes. This is necessary to share certain information (such as the transaction for the read snapshot) between jobs.

@BorisTyshkevich
Copy link
Author

How should SNAPSHOT_AND_INCREMENT work with Job and StatefulSet together? Why does trcli replication wait for trcli activation? How do they sync?

The communication occurs via state stored in S3. The replication job checks whether the state already exists—if it does, the snapshot step is skipped. This mechanism is the same as how restarts are handled.

As I understand, it's a general rule for trcli itself. However, the question is about Helm Chart:

{{- if eq .Values.transferSpec.type "SNAPSHOT_AND_INCREMENT" }}
{{ include "snapshot-job" . }}
---
{{ include "snapshot-regular-cronjob" . }}
---
{{ include "replication-statefulset" . }}

Instead of creating a single StatefulSet and letting him do all the snapshotting and subsequent replication, all three objects - Job/CronJob/StatefulSet - were created here. What is the idea?

@laskoviymishka
Copy link
Contributor

Afaik inside template there is an if that skip generation of anything if transfer type is not the one is needed

@BorisTyshkevich
Copy link
Author

@laskoviymishka
Copy link
Contributor

Honestly - there is no very big design thinking behind this helm chart :D
So if you will add if-s there - would be nice <3

@BorisTyshkevich
Copy link
Author

As for now, I don't see any reason to start many pods (f.e. by StatefulSet and Job) for a single transfer job, so all such stuff looks like a non-needed complication and has to be removed. Maybe I'm wrong; that is why I've opened this issue.

I probably need to discuss that with the developer of Helm Chart - who and how uses that, so I can understand requirements and reintroduce those features in my chart.

Another reason for running several pods per transfer job could be #223, but it's a long way to go.

@laskoviymishka
Copy link
Contributor

The whole idea of this job-s is to enable horizontal scaling for transfer.
Transfer can scale horizontally by adding more pods in snapshots, and for some sources (kafka / kinesis / s3) - in replication.
The only requirement for those scaled pods - to know how many workers are up (total amount) and what is current index of current worker.

@BorisTyshkevich
Copy link
Author

OK. That is clear.

Do you also look at pod suffixes like app.kubernetes.io/name: {{ .Release.Name }}-replication (or snapshot) to understand that they are doing a shared work on the same source?

And how is the work sharded? Is it really possible to split the work for a generic mysql/postgres table without any partitioning and PK index?

@laskoviymishka
Copy link
Contributor

laskoviymishka commented Feb 21, 2025

Nope, code only search for indexes, suffixes are for humans.

And how is the work sharded? Is it really possible to split the work for a generic mysql/postgres table without any partitioning and PK index?

  1. Worker with index 0 (initializer) analyzes the database schema and determines partitioning strategies:
    • Uses table statistics.
    • Splits data by PK ranges if possible.
    • Otherwise, applies heuristics (e.g., partitioning by id % N).
  2. Worker 0 stores the list of segments (value ranges) in the coordinator (e.g., S3).
  3. Workers 1..N request available segments from the coordinator, process them, and report completion.
sequenceDiagram
    participant Worker0 as Worker 0 (Main)
    participant Coordinator as Coordinator (S3)
    participant Workers as Workers (Secondary)
    participant DB as MySQL/PostgreSQL

    Worker0 ->> DB: Analyze schema
    Worker0 ->> Coordinator: Store data split information
    loop For each Worker
        Workers ->> Coordinator: Request segment
        Workers ->> DB: Process segment
        Workers ->> Coordinator: Report completion
    end
    Coordinator ->> Worker0: Wait All Completed
Loading

@BorisTyshkevich
Copy link
Author

Thank you for the explanation.

Here is my Helm Chart with multiple transfer tasks, adopted for Altinity Cloud - https://github.com/bvt123/transfer/tree/main/helm

@laskoviymishka
Copy link
Contributor

This helm - is looks legit, looks much nicer and cleaner, maybe it worth to make a PR with it?

@bvt123
Copy link

bvt123 commented Feb 23, 2025

Created a PR. Please approve if you like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants