Preferred nodes #139

pwdng · 2018-03-30T20:03:59Z

Allow preferred nodes to be configured

If a cluster is distributed geographically, it can be preferable to query the local nodes. For readonly commands, preferred nodes let us chose a subset of the cluster nodes that will be prioritised when choosing a node. A specific local slave can therefore be used.

If a cluster is distributed geographically, it can be preferable to query the local nodes. For readonly commands, we have the posibility to chose a subset of the clsuter nodes that will be prioritized when choosing a node. A specific local slave can therefore be used.

doyoubi · 2018-04-02T10:07:10Z

That's cool. If I'm not mistaken, you built a single redis cluster across multiple data centers. Before we dive into the implementation, I'm so curious on how you can build a geographically distributed redis cluster.
As far as I know, due to the unreliable network between data centers, the slaves in such geographically distributed cluster will keep promoting to master when suffering high network latency. It's just not reliable to simply deploy the unmodified official redis cluster in multiple data centers. The only team using this solution I know did a great change on the origin redis source code and reimplement the failure detection and master promotion themselves.
So how can you do that? Does it run perfectly in your production environment?

pwdng · 2018-04-03T09:47:09Z

We have as many redis clusters as we have DCs with the masters for a given cluster in a single DC and the data replicated to the slaves in all the other DCs. Indeed, because of the possible network issues between the DCs, we had to disable slave promotion by setting cluster-node-timeout to a very high value. That way, we are sure the masters for a cluster stay in the same DC. Now, of course this means our clusters won’t heal in case a master goes down but this is acceptable to us. We have applications in each DC that need to read data from the clusters and preferably from the local nodes in order to keep the latency down. Hence the PR that lets us configure corvus in each DC with the list of preferred nodes. I understand our use-case is a little special but I believe others could find this useful.

…

On 02 Apr 2018, at 12:07, doyoubi ***@***.***> wrote: That's cool. If I'm not mistaken, you built a single redis cluster across multiple data centers. Before we dive into the implementation, I'm so curious on how you can build a distributed geographically redis cluster. As far as I know, due to the unreliable network between data centers, the slaves in such distributed geographically cluster will keep promoting to master when suffering high network latency. It's just not reliable to simply deploy the unmodified official redis cluster in multiple data centers. The only team using this solution I know did a great change in the origin redis source code and reimplement the failure detection and master promotion themselves. So how can you do that? Does it run perfectly in your production environment? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#139 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADiA_bf3kIRDNKZMyQKsHVjrjzs5q5fUks5tkfhQgaJpZM4TCFRe>.

doyoubi · 2018-04-03T11:39:46Z

To be honest I'm not in favor of doing this.
If you want single way replication I recommend using separate clusters and replicating data from one to another. You might want to do this for the following reason.

Network problems between DCs will not affect the availability of redis cluster.
You can use queue to eliminate full sync replication (You may need large psync queue inside redis in your current solution).

There're also some great tools for you:

Please check them out.

tevino · 2018-04-04T02:33:23Z

Here's one more: https://github.com/CodisLabs/redis-port

pwdng · 2018-04-04T08:25:49Z

The problem with having a local cluster that would be a copy of the master is that we lose the benefit of the multiple slaves that redis cluster provides. In our current configuration, each master has a slave in every other DC. With the changes to corvus, we’ll be reading from the local nodes (master or slave) but if one of them goes down, corvus will automatically switch to another node (in another DC). The system will be in degraded mode because of the extra network latency but at least it can cope with the loss of one or more nodes. Having completely separate clusters that are replicated doesn’t provide that flexibility.

…

On 04 Apr 2018, at 04:33, Tevin ***@***.***> wrote: Here's one more: https://github.com/CodisLabs/redis-port <https://github.com/CodisLabs/redis-port> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#139 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADiA_b_QnfOZh0t90pIQ6AGLFHR57aDLks5tlDD1gaJpZM4TCFRe>.

doyoubi · 2018-04-04T09:26:28Z

Yes, it might be easier to just use a single cluster across multiple DCs for the one-to-many replication.
But for the case that the slave pulling data from another DC goes down, it would be better off using a separated cluster because this cluster can guarantee high availability itself and we don't have to read data from remote DC.
Redis is primarily used for cache and reading from another DC is not acceptable in most use cases.

Further, the frequently exchanged gossip packet goes crazy in large clusters with over hundreds of nodes which occupies the bandwidth between DCs.
You may also find master sometimes starts full sync replication and fail again and again without a large enough queue between DCs.

Building your own replication tools with a large queue makes better performance and availability and has already been chosen by most of teams building multiple DC replication as far as I know.

Your solution is easy to build but lacks multiple backup slaves in the DCs receiving data. I suggest try building your own one.

CLAassistant · 2023-02-24T10:32:18Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

pwdng added 2 commits March 30, 2018 21:53

Redo readonly command if an error is returned

3088665

pwdng force-pushed the preferred_nodes branch from 7db5e7b to e149ba3 Compare March 30, 2018 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preferred nodes #139

Preferred nodes #139

pwdng commented Mar 30, 2018

doyoubi commented Apr 2, 2018 •

edited

Loading

pwdng commented Apr 3, 2018 via email

doyoubi commented Apr 3, 2018

tevino commented Apr 4, 2018

pwdng commented Apr 4, 2018 via email

doyoubi commented Apr 4, 2018

CLAassistant commented Feb 24, 2023

Preferred nodes #139

Are you sure you want to change the base?

Preferred nodes #139

Conversation

pwdng commented Mar 30, 2018

doyoubi commented Apr 2, 2018 • edited Loading

pwdng commented Apr 3, 2018 via email

doyoubi commented Apr 3, 2018

tevino commented Apr 4, 2018

pwdng commented Apr 4, 2018 via email

doyoubi commented Apr 4, 2018

CLAassistant commented Feb 24, 2023

doyoubi commented Apr 2, 2018 •

edited

Loading