-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe
Currently in translog.MultiSnapshot, the translog is read in reverse order. For example, when replaying, it'll process the translog files as translog_3.tlog -> translog_2.tlog -> translog_1.tlog. The reason behind this is to avoid stale operations during recovery, as in reverse order, the higher term operation will be replayed, and the lower term operation will be deduped.
Since we now also do indexShard.trimOperationOfPreviousPrimaryTerms(trimAboveSeqNo); in the finalizingRecovery phase, the duplicated operations from the previous term will be trimmed, which enables us to read the translog forward in most cases.
There is a TODO in the code: // TODO: Read translog forward in 9.0+. The author ended up leaving it there because there might be edge cases when read forward, for example, the recovery replica failed before it was able to trim the translog, and it then retries another PEER recovery, when replaying translog locally up to it's global checkpoint, it might still read the stale operations.
However, this edge case can be avoided if we trim the translog in the early phase, before it starts the translog replay.
Describe the solution you'd like
- Add settings to allow force read forward in translog
- When it's reading forward, trim the translog before it actually starts the replay.
Related component
Indexing:Replication
Describe alternatives you've considered
No response
Additional context
No response