fix: Reconstruct slave sync thread model #2638
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
这个PR做了哪些事:
1 Slave端主从同步线程模型的重构(fix #2637 ):
2 修复了主从超时重连场景下, 因为Slave连续发送两次TrySync Req而导致的Sync Win崩溃问题(fix #2655 )
直接原因:Slave在超时重连时,在短时间内连续发出了2次一模一样的TrySync请求(参数中携带的BinlogOfft都一样),Master端会对这2条TrySync请求做同样的处理(每次都会清空WriteQueue和SyncWin,然后从TrySync请求中携带的Binlog偏移量位置开始发送Binlog),在这时间附近的某些BInlog会被发送2次,Slave也会对这些Binlog消费2次,进而导致了Slave返回的BinlogACK被Master认为不合法。
为什么Slave会连续发出2次TrySync: Slave端消费Binlog的Worker线程的任务队列在发出第一次TrySync时依旧还有上一次主从连接期间积压的写Binlog任务(Slave端超时断联,转入TryConnect状态时,会去一条一条丢弃此时WorkerThread中积压的写Binlog任务,这里的问题是丢得太慢了或者说下一次TrySync发的太快了),当Slave收到第一条TrySync请求的响应,会进入Connected状态,于是会开始消费之前积压的,来自前一次主从连接的写Binlog任务,而这些BInlog的SeesionID对不上,就会触发错误处理分支,将Slave转到TrySync状态,所以slave紧接着发出了第二条TrySync请求。
3 修复了“某个Binlog任务阻塞Slave很久,导致超时重连后Master从错误的起始位置续传Binlog”的问题(fix #2659 )
4 合并本PR以后,Slave对TrySync Reps的处理改成了同步(相较于消费Binlog)处理,那么在某些极端情况下(Slave阻塞比较严重),主从建联可能会延迟,Slave将停留在WaitReply的时间会延长,此时master_link_status也为down,所以另提了 PR #2656 ,给运维增加了更细粒度的监控指标 repl_connect_status以便在master_link_status为down时进一步判断情况。
What does this PR do:
1. Refactoring of the thread model (fix #2637):
2. Fixed the issue of Sync Win crash caused by Slave sending two consecutive TrySync Req in the scenario of master-slave timeout reconnection (fix #2655):
Direct cause: When the Slave times out and reconnects, it sends two identical TrySync requests in a short period (with the same BinlogOfft parameter). The Master will handle these two TrySync requests in the same way (each time clearing the WriteQueue and SyncWin, then sending Binlog from the offset position carried in the TrySync request). Some Binlogs near this time will be sent twice, and the Slave will consume these Binlogs twice, causing the BinlogACK returned by the Slave to be considered invalid by the Master.
Why does the Slave send two consecutive TrySync: The task queue of the Slave's Binlog-consuming Worker thread still has the write Binlog tasks accumulated during the last master-slave connection when the first TrySync is sent. When the Slave times out and disconnects, entering the TryConnect state, it discards the accumulated write Binlog tasks one by one. The problem is that this process is too slow, or the next TrySync is sent too quickly. When the Slave receives the response to the first TrySync request, it enters the Connected state and starts consuming the previously accumulated write Binlog tasks. Since the SessionIDs of these Binlogs do not match, it triggers the error handling branch, sending the Slave back to the TrySync state, thus sending the second TrySync request.
3. Fixed the issue of "a certain Binlog task blocking the Slave for a long time, causing the Master to resume Binlog transmission from the wrong starting position after a timeout reconnection" (fix #2659):
4. After merging this PR, the handling of TrySync Reps by the Slave has been changed to synchronous (compared to consuming Binlog). Therefore, in some extreme cases (severe Slave blocking), the master-slave connection may be delayed, causing the Slave to stay in the WaitReply state for an extended period. During this time, the master_link_status will also be down. Therefore, PR #2656 has been proposed to add a more granular monitoring metric for operations: repl_connect_status.