You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The vtctld command UpdateCellInfo can be quite destructive when the topo server address is updated in real time, which resulted in crash/panic on all vtgates in that cell, causing significant customer downtime. The panic stack:
The main problem is in this code block, when a new watcher (either a shard watcher, or topo watcher) came in, it found the cell server address is changed, and then closed all the cached connections which are still being used by other watchers:
In vt/topo/server.go:
// Client exists in cache.
// Let's verify that it is the same cell as we are looking for.
// The cell name can be re-used with a different ServerAddress and/or Root
// in which case we should get a new connection and update the cache
if ci.ServerAddress == cc.cellInfo.ServerAddress && ci.Root == cc.cellInfo.Root {
return cc.conn, nil
}
// Close the cached connection, we don't need it anymore
if cc.conn != nil {
cc.conn.Close()
}
Supporting onlineUpdateCellInfo will be very useful in the context of topo server migrations. Although the crash stack is on the Consul client, the problem also exists on other topo servers.
Use Case(s)
Support topo server migration without customer downtime, which can be server DNS/url change or migration between supported topo server types.
The text was updated successfully, but these errors were encountered:
Feature Description
The vtctld command
UpdateCellInfo
can be quite destructive when the topo server address is updated in real time, which resulted in crash/panic on all vtgates in that cell, causing significant customer downtime. The panic stack:The main problem is in this code block, when a new watcher (either a shard watcher, or topo watcher) came in, it found the cell server address is changed, and then closed all the cached connections which are still being used by other watchers:
In
vt/topo/server.go
:Supporting online
UpdateCellInfo
will be very useful in the context of topo server migrations. Although the crash stack is on the Consul client, the problem also exists on other topo servers.Use Case(s)
Support topo server migration without customer downtime, which can be server DNS/url change or migration between supported topo server types.
The text was updated successfully, but these errors were encountered: