forked from pingcap/docs-cn
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
reference/cdc: add usage document for ticdc (pingcap#2452)
Co-authored-by: TomShawn <[email protected]> Co-authored-by: TomShawn <[email protected]>
- Loading branch information
1 parent
60af82f
commit ded41f3
Showing
6 changed files
with
425 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
--- | ||
title: 部署和使用 TiCDC | ||
category: reference | ||
--- | ||
|
||
# 部署和使用 TiCDC | ||
|
||
本文介绍如何部署和使用 TiCDC 进行增量数据同步 | ||
|
||
## 第 1 步:部署 TiCDC 集群 | ||
|
||
假设 PD 集群有一个可以提供服务的 PD 节点 `client-url=10.0.10.25:2379`。若要部署三个 TiCDC 节点,可以按照以下命令启动集群。只需要指定相同的 PD 地址,新启动的节点就可以自动加入 TiCDC 集群。 | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_1.log --status-addr=127.0.0.1:8301 | ||
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_2.log --status-addr=127.0.0.1:8302 | ||
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --status-addr=127.0.0.1:8303 | ||
``` | ||
|
||
## 第 2 步:创建同步任务 | ||
|
||
假设需要将上游所有的库表(系统表除外)同步到下游的 MySQL,可以通过以下命令创建同步任务: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:[email protected]:3306/" | ||
``` | ||
|
||
以上命令中的选项解释如下: | ||
|
||
- `pd`: PD client 的 URL。 | ||
- `start-ts`: 指定开始同步的 TSO,不指定或指定为 `0` 时将使用当前 TSO 作为同步的起始 TSO。 | ||
- `sink-uri`: sink 地址,目前支持 `mysql`/`tidb` 和 `kafka`。关于 sink URI 的写法请参考 [sink URI 配置规则](/reference/tools/ticdc/sink.md) | ||
- `config`: 同步任务的配置。目前提供黑白名单配置和跳过特定 `commit-ts` 的事务。 | ||
|
||
执行该命令后,TiCDC 就会从指定的 start-ts (`415238226621235200`) 开始同步数据到下游 MySQL (`127.0.0.1:3306`) 中。 | ||
|
||
如果希望同步数据到 Kafka 集群,需要先在 Kafka 集群中创建好 topic(比如以下示例创建了名为 `cdc-test` 的 topic),划分好 partition,并通过以下命令创建到 Kafka 集群的同步任务: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="kafka://10.0.10.30:9092/cdc-test" | ||
``` | ||
|
||
执行命令以上后,TiCDC 会从指定 `start-ts` 开始同步数据到下游 Kafka (`10.0.10.30:9092`) 中。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
--- | ||
title: 管理集群和同步任务 | ||
category: reference | ||
--- | ||
|
||
# 管理集群和同步任务 | ||
|
||
目前 TiCDC 提供命令行工具 `cdc cli` 和 HTTP 接口两种方式来管理集群和同步任务。 | ||
|
||
## 使用 `cdc cli` 工具来管理集群状态和数据同步 | ||
|
||
以下内容介绍如何使用 `cdc cli` 工具来管理集群状态和数据同步。 | ||
|
||
### 管理TiCDC 服务进程 (`capture`) | ||
|
||
- 查询 `capture` 列表: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli capture list | ||
``` | ||
|
||
``` | ||
[ | ||
{ | ||
"id": "6d92386a-73fc-43f3-89de-4e337a42b766", | ||
"is-owner": true | ||
}, | ||
{ | ||
"id": "b293999a-4168-4988-a4f4-35d9589b226b", | ||
"is-owner": false | ||
} | ||
] | ||
``` | ||
|
||
### 管理同步任务 (`changefeed`) | ||
|
||
- 创建 `changefeed`: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli changefeed create --sink-uri="mysql://root:[email protected]:3306/" | ||
create changefeed ID: 28c43ffc-2316-4f4f-a70b-d1a7c59ba79f info {"sink-uri":"mysql://root:[email protected]:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"config":{"filter-case-sensitive":false,"filter-rules":null,"ignore-txn-commit-ts":null}} | ||
``` | ||
|
||
- 查询 `changefeed` 列表: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli changefeed list | ||
``` | ||
|
||
``` | ||
[ | ||
{ | ||
"id": "28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" | ||
} | ||
] | ||
``` | ||
|
||
- 查询特定 `changefeed`,对应于某个同步任务的信息和状态: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli changefeed query --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f | ||
``` | ||
|
||
``` | ||
{ | ||
"info": { | ||
"sink-uri": "mysql://root:[email protected]:3306/", | ||
"opts": {}, | ||
"create-time": "2020-03-12T22:04:08.103600025+08:00", | ||
"start-ts": 415241823337054209, | ||
"target-ts": 0, | ||
"admin-job-type": 0, | ||
"config": { | ||
"filter-case-sensitive": false, | ||
"filter-rules": null, | ||
"ignore-txn-commit-ts": null | ||
} | ||
}, | ||
"status": { | ||
"resolved-ts": 415241860902289409, | ||
"checkpoint-ts": 415241860640145409, | ||
"admin-job-type": 0 | ||
} | ||
} | ||
``` | ||
|
||
### 管理同步子任务处理单元 (`processor`) | ||
|
||
- 查询 `processor` 列表: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli processor list | ||
``` | ||
|
||
``` | ||
[ | ||
{ | ||
"id": "9f84ff74-abf9-407f-a6e2-56aa35b33888", | ||
"capture-id": "b293999a-4168-4988-a4f4-35d9589b226b", | ||
"changefeed-id": "28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" | ||
} | ||
] | ||
``` | ||
|
||
- 查询特定 `processor`,对应于某个节点处理的同步子任务信息和状态: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
cdc cli processor query --changefeed-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f --capture-id=b293999a-4168-4988-a4f4-35d9589b226b | ||
``` | ||
|
||
``` | ||
{ | ||
"status": { | ||
"table-infos": [ | ||
{ | ||
"id": 45, | ||
"start-ts": 415241823337054209 | ||
} | ||
], | ||
"table-p-lock": null, | ||
"table-c-lock": null, | ||
"admin-job-type": 0 | ||
}, | ||
"position": { | ||
"checkpoint-ts": 415241893447467009, | ||
"resolved-ts": 415241893971492865 | ||
} | ||
} | ||
``` | ||
|
||
## 使用 HTTP 接口管理集群状态和数据同步 | ||
|
||
目前 HTTP 接口提供一些基础的查询和运维功能。在以下接口描述中,假设 CDC server 的状态查询接口 IP 地址为 `127.0.0.1`,状态端口地址为 `8300`(在启动 CDC server 时通过 `--status-addr=ip:port` 指定绑定的 IP 和端口)。在后续版本中这部分功能也会集成到 `cdc cli` 中。 | ||
|
||
### 获取 CDC server 状态信息的接口 | ||
|
||
使用以下命令获取 CDC server 状态信息的接口: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
curl http://127.0.0.1:8300/status | ||
``` | ||
|
||
``` | ||
{ | ||
"version": "0.0.1", | ||
"git_hash": "863f8ea889b144244ff53593a45c47ad22d37396", | ||
"id": "6d92386a-73fc-43f3-89de-4e337a42b766", # capture id | ||
"pid": 12102 # cdc server pid | ||
} | ||
``` | ||
|
||
### 驱逐 owner 节点 | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
curl -X POST http://127.0.0.1:8300/capture/owner/resign | ||
``` | ||
|
||
以上命令仅对 owner 节点请求有效。 | ||
|
||
``` | ||
{ | ||
"status": true, | ||
"message": "" | ||
} | ||
``` | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
curl -X POST http://127.0.0.1:8301/capture/owner/resign | ||
election: not leader | ||
``` | ||
|
||
以上命令对非 owner 节点请求返回错误。 | ||
|
||
### 停止同步任务 | ||
|
||
使用以下命令来停止同步任务: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
curl -X POST -d "admin-job=1&cf-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" http://127.0.0.1:8301/capture/owner/admin | ||
``` | ||
|
||
``` | ||
{ | ||
"status": true, | ||
"message": "" | ||
} | ||
``` | ||
|
||
以上命令中: | ||
|
||
- `admin-job=1`,表示停止任务。停止任务后所有同步 `processor` 会结束退出,同步任务的配置和同步状态都会保留,可以从 `checkpoint-ts` 恢复任务。 | ||
- `cf-id=xxx` 为需要操作的 `changefeed` ID。 | ||
|
||
### 恢复同步任务 | ||
|
||
使用以下命令恢复同步任务: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
curl -X POST -d "admin-job=2&cf-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" http://127.0.0.1:8301/capture/owner/admin | ||
``` | ||
|
||
``` | ||
{ | ||
"status": true, | ||
"message": "" | ||
} | ||
``` | ||
|
||
以上命令中: | ||
|
||
- `admin-job=2`,表示恢复任务,同步任务从 `checkpoint-ts` 继续同步。 | ||
- `cf-id=xxx` 为需要操作的 `changefeed` ID。 | ||
|
||
### 删除同步任务 | ||
|
||
使用以下命令删除同步任务: | ||
|
||
{{< copyable "shell-regular" >}} | ||
|
||
```shell | ||
curl -X POST -d "admin-job=3&cf-id=28c43ffc-2316-4f4f-a70b-d1a7c59ba79f" http://127.0.0.1:8301/capture/owner/admin | ||
``` | ||
|
||
``` | ||
{ | ||
"status": true, | ||
"message": "" | ||
} | ||
``` | ||
|
||
- `admin-job=3`,表示删除任务,接口请求后会结束所有同步 `processor`,并清理同步任务配置信息。同步状态保留,只提供查询,没有其他实际功能。 | ||
- `cf-id=xxx` 为需要操作的 `changefeed` ID。 | ||
|
||
## 异常管理 | ||
|
||
本部分描述如何管理 TiCDC 同步数据中遇到的异常。 | ||
|
||
### TiCDC 向下游同步语句出错 | ||
|
||
TiCDC 向下游执行 DDL 或 DML 语句出错后会自动停止同步任务。 | ||
|
||
- 如果是因为下游异常、网络抖动等情况,可以直接恢复任务重试; | ||
- 如果是因为下游不兼容的 SQL 问题,重试任务不会成功。此时可以通过同步配置的 `ignore-txn-commit-ts` 参数跳过指定 `commit-ts` 对应的事务,然后恢复同步任务。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
--- | ||
title: TiCDC 简介 | ||
category: reference | ||
--- | ||
|
||
# TiCDC 简介 | ||
|
||
[TiCDC](https://github.com/pingcap/ticdc) 是一款通过拉取 TiKV 变更日志实现的 TiDB 增量数据同步工具,具有将数据还原到与上游任意 TSO 一致状态的能力,同时提供开放数据协议,支持其他系统订阅数据变更。 | ||
|
||
## TiCDC 架构 | ||
|
||
TiCDC 运行时是一种无状态节点,通过 PD 内部的 etcd 实现高可用。TiCDC 集群支持创建多个同步任务,向多个不同的下游进行数据同步。TiCDC 的系统架构如下图所示: | ||
|
||
 | ||
|
||
### 系统角色 | ||
|
||
- TiKV CDC component: 只输出 key-value (KV) change log。 | ||
|
||
- 内部逻辑拼装 KV change log。 | ||
- 提供输出 KV change log 的接口,发送数据包括实时 change log 和增量扫的 change log。 | ||
|
||
- `capture`: TiCDC 运行进程,多个 `capture` 组成一个 TiCDC 集群,负责 KV change log 的同步。 | ||
|
||
- 每个 `capture` 负责拉取一部分 KV change log。 | ||
- 对拉取的一个或多个 KV change log 进行排序。 | ||
- 向下游还原事务或按照 TiCDC open protocol 进行输出。 | ||
|
||
## 同步功能介绍 | ||
|
||
本部分介绍 TiCDC 的同步功能。 | ||
|
||
### sink 支持 | ||
|
||
目前 TiCDC sink 模块支持同步数据到以下下游: | ||
|
||
- MySQL 协议兼容的数据库,提供最终一致性支持。 | ||
- 以 TiCDC open protocol 输出到 Kafka,可实现行级别有序、最终一致性或严格事务一致性三种一致性保证。 | ||
|
||
### 库表同步黑白名单 | ||
|
||
用户可以通过编写黑白名单过滤规则,来过滤或只同步某些数据库或某些表的所有变更数据。过滤规则类似于 MySQL `replication-rules-db` 或 `replication-rules-table`。 | ||
|
||
## 使用限制 | ||
|
||
将数据同步到 TiDB 或 MySQL,需要满足以下条件才能保证正确性: | ||
|
||
- 表必须要有主键或者唯一索引。 | ||
- 如果表只存在唯一索引,至少有一个唯一索引的每一列在表结构中明确定义 `NOT NULL`。 |
Oops, something went wrong.