no cluster，cluster_mrg and node_mgr standing for one night,clustermgr no heartbeat,node_mgr no log output #25

jd-zhang · 2022-05-18T11:13:00Z

Issue migrated from trac ticket # 705

component: cluster manager | priority: major

2022-05-18 11:13:00: [email protected] created the issue

1.log of cluster mgr
Wed May 18 10:50:40 2022 tid:0x5e6b [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/http_server/http_server.cc:300 GenerateRequest]: Http post: {
"version":"1.0",
"job_id":"",
"job_type":"create_cluster",
"user_name":"kunlun_test",
"timestamp":"202205131532",
"paras":{
"nick_name":"rbrcluster001",
"ha_mode":"rbr",
"shards":"2",
"nodes":"3",
"comps":"1",
"max_storage_size":"20",
"max_connections":"6",
"cpu_cores":"8",
"innodb_size":"1",
"dbcfg":"1",
"machinelist": [ {"hostaddr":"192.168.0.129"} ]
}
}
2.
that time nodemgr no log output

jd-zhang · 2022-05-18T11:14:15Z

2022-05-18 11:14:15: [email protected] commented

两个mgr都是启动状态，且元数据表里没有集群，静置一晚上。
次日早上发了一条创建集群命令，cluster_mgr收到，但没有写到数据库cluster_general_job_log，
当时研发以为可能是元数据表出错导致不能写入，但登录元数据主库可以写入，之后再次发送创建集群命令，就正常可以写入和启动创建集群动作了。

jd-zhang · 2022-05-18T11:15:36Z

2022-05-18 11:15:36: [email protected] commented

创建集群的数据是这样的:{
"version":"1.0",
"job_id":"",
"job_type":"create_cluster",
"user_name":"kunlun_test",
"timestamp":"202205131532",
"paras":{
"nick_name":"rbrcluster001",
"ha_mode":"rbr",
"shards":"2",
"nodes":"3",
"comps":"1",
"max_storage_size":"20",
"max_connections":"6",
"cpu_cores":"8",
"innodb_size":"1",
"dbcfg":"1",
"machinelist": [ {"hostaddr":"${node_mgr.1}"} ]
}
}

jd-zhang · 2022-05-19T09:41:04Z

2022-05-19 09:41:04: [email protected] commented

18号晚上重现了这个问题，
发送创建rbr集群，api返回：
{"attachment":null,"error_code":"1","error_info":"execute query failed [this lead to connection closed]: , error number: 2006, sql: begin","status":"failed","version":"1.0"}

此时clustermgr只有这点日志：
Thu May 19 09:36:40 2022 tid:0x5e63 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/http_server/http_server.cc:300 GenerateRequest]: Http post: {
"version":"1.0",
"job_id":"",
"job_type":"create_cluster",
"user_name":"kunlun_test",
"timestamp":"202205131532",
"paras":{
"nick_name":"rbrcluster002",
"ha_mode":"rbr",
"shards":"2",
"nodes":"3",
"comps":"1",
"max_storage_size":"20",
"max_connections":"6",
"cpu_cores":"8",
"innodb_size":"1",
"dbcfg":"1",
"machinelist": [ {"hostaddr":"192.168.0.129"} ]
}
}

此时nodemgr没有日志。

jd-zhang · 2022-05-19T10:10:56Z

2022-05-19 10:10:56: @chaojie1979 commented

应该是写数据库的连接断了，后面增加重试机制

jd-zhang · 2022-05-20T10:16:59Z

2022-05-20 10:16:59: @chaojie1979 commented

zettalib里面增加重试机制，之前接口通过statement_retries配置重试次数

jd-zhang · 2022-05-20T10:16:59Z

2022-05-20 10:16:59: @chaojie1979 changed owner from chaojie to snow

jd-zhang · 2022-05-20T10:19:16Z

2022-05-20 10:19:16: [email protected] commented

第三次重现，clustermgr输出这样的打印：Fri May 20 09:46:35 2022 tid:0xf1f61 [ERROR] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/http_server/http_server.cc:341 GenerateRequestUniqueId]: execute query failed [this lead to connection closed]: , error number: 2006, sql: begin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no cluster，cluster_mrg and node_mgr standing for one night,clustermgr no heartbeat,node_mgr no log output #25

no cluster，cluster_mrg and node_mgr standing for one night,clustermgr no heartbeat,node_mgr no log output #25

jd-zhang commented May 18, 2022

jd-zhang commented May 18, 2022

jd-zhang commented May 18, 2022

jd-zhang commented May 19, 2022

jd-zhang commented May 19, 2022

jd-zhang commented May 20, 2022

jd-zhang commented May 20, 2022

jd-zhang commented May 20, 2022

no cluster，cluster_mrg and node_mgr standing for one night,clustermgr no heartbeat,node_mgr no log output #25

no cluster，cluster_mrg and node_mgr standing for one night,clustermgr no heartbeat,node_mgr no log output #25

Comments

jd-zhang commented May 18, 2022

2022-05-18 11:13:00: [email protected] created the issue

jd-zhang commented May 18, 2022

2022-05-18 11:14:15: [email protected] commented

jd-zhang commented May 18, 2022

2022-05-18 11:15:36: [email protected] commented

jd-zhang commented May 19, 2022

2022-05-19 09:41:04: [email protected] commented

jd-zhang commented May 19, 2022

2022-05-19 10:10:56: @chaojie1979 commented

jd-zhang commented May 20, 2022

2022-05-20 10:16:59: @chaojie1979 commented

jd-zhang commented May 20, 2022

2022-05-20 10:16:59: @chaojie1979 changed owner from chaojie to snow

jd-zhang commented May 20, 2022

2022-05-20 10:19:16: [email protected] commented