在redis_cluster集群基础上安装了corvus后为啥tps降低了 #150

sccotJiang · 2019-11-06T10:33:47Z

我创建了3个主节点 3个从节点，然后配置如下：
bind 12345
node 127.0.0.1:6379,127.0.0.1:6380,127.0.0.1:6381
thread 4
启动后用redis-benchmark压测发现tps降了
不用corvus是这样 GET: 33046.93 requests per second
用了corvus后 GET: 5262.88 requests per second

whk-isat · 2019-12-06T08:24:29Z

我也是遇到相同的问题，用corvus代理集群居然比直接访问差好多，请问你后面是怎么解决的

sccotJiang · 2019-12-12T15:07:53Z

还没解决

jasonjoo2010 · 2019-12-30T03:30:34Z

这样的情况下，需要具体研究，一般是三个大方向：

Ping值

因为加了中间层，所以需要具体调研链路情况，从client -> redis变成了client -> corvus -> redis，简单的测试方法，就是单线程地调用一定次数（例如10000次）使用平均值来对比。

如果Ping值提高了（较为显著），需要先去解决这个问题，但此时并发能力并无太大变化。

带宽

参考Ping值

多例

corvus被设计为无状态，可多例做负载均衡，例如三个实例来同时做压力均衡，这种模式下再做压测。

savagecm · 2020-01-13T04:06:34Z

maybe because of redis pipeline？

corvus/src/client.c

Line 198 in df06ef7

// wait for all cmds in cmd_queue to be done

For example, if Corvus reveive 3 command in the pipeline. Then Corvus will wait for the response for all the three command. Then send back to client.

We had did some test, if we do not use this command queue to wait all the response. This will improve performance a lot.

So could we change the code to remove the wait command queue and reply to clinet immediately?
or how can we contribute?
@jasonjoo2010

jasonjoo2010 · 2020-01-13T08:03:37Z

maybe because of redis pipeline？

corvus/src/client.c

Line 198 in df06ef7

// wait for all cmds in cmd_queue to be done

For example, if Corvus reveive 3 command in the pipeline. Then Corvus will wait for the response for all the three command. Then send back to client.

We had did some test, if we do not use this command queue to wait all the response. This will improve performance a lot.

So could we change the code to remove the wait command queue and reply to clinet immediately?
or how can we contribute?
@jasonjoo2010

I think things are not so simple.

I just do a local test with 6 nodes which the original author mentioned and only ONE corvus proxy (also locally) with 4 threads and ONLY do tests "get" and "set" (I will explain it later) and the results are acceptable:

4 clients benchmark: redis-benchmark -h 127.0.0.1 -p 6666 -c 4 -n 1000000 -t get,set

====== SET ======
  1000000 requests completed in 21.22 seconds
  4 parallel clients
  3 bytes payload
  keep alive: 1

47123.13 requests per second

====== GET ======
  1000000 requests completed in 17.19 seconds
  4 parallel clients
  3 bytes payload
  keep alive: 1

58180.12 requests per second

10 clients benchmark: redis-benchmark -h 127.0.0.1 -p 6666 -c 10 -n 1000000 -t get,set

====== SET ======
  1000000 requests completed in 13.09 seconds
  10 parallel clients
  3 bytes payload
  keep alive: 1

76382.52 requests per second

====== GET ======
  1000000 requests completed in 11.92 seconds
  10 parallel clients
  3 bytes payload
  keep alive: 1

83899.66 requests per second

20 clients benchmark: redis-benchmark -h 127.0.0.1 -p 6666 -c 20 -n 1000000 -t get,set

====== SET ======
  1000000 requests completed in 11.78 seconds
  20 parallel clients
  3 bytes payload
  keep alive: 1

84904.06 requests per second

====== GET ======
  1000000 requests completed in 10.20 seconds
  20 parallel clients
  3 bytes payload
  keep alive: 1

98048.83 requests per second

For comparing I also test with the original redis nodes and fill them into a table:

GET:

clients	redis	proxy
4	108448.11	58180.12
10	113019.90	83899.66
20	111969.55	98048.83

SET:

clients	redis	proxy
4	102997.23	47123.13
10	108389.34	76382.52
20	112019.72	84904.06

The capacity is so constant when connecting to redis directly.
Why?
Because redis-benchmark doesn't support cluster until 6.0 but I just try 6.0-rc1 and got core dump. So I don't dig it deeper because the test results under proxy are making sense for me. (Though threy are locally)

I think the guy who submitted this issue may not really put the load to redis (most reply got MOVED so it's fast).

What should be really paid attention is that the key point I mentioned in previous post: Don't miss the physical latency in the transferring route.

jasonjoo2010 · 2020-01-13T08:07:20Z

And attach the cluster booting commands for reference:

redis-server --port 6380 --cluster-enabled yes --cluster-config-file nodes-6380.conf
redis-server --port 6381 --cluster-enabled yes --cluster-config-file nodes-6381.conf
redis-server --port 6382 --cluster-enabled yes --cluster-config-file nodes-6382.conf

redis-server --port 6383 --cluster-enabled yes --cluster-config-file nodes-6383.conf
redis-server --port 6384 --cluster-enabled yes --cluster-config-file nodes-6384.conf
redis-server --port 6385 --cluster-enabled yes --cluster-config-file nodes-6385.conf

redis-cli -h 127.0.0.1 -p 6380 cluster meet 127.0.0.1 6381
redis-cli -h 127.0.0.1 -p 6380 cluster meet 127.0.0.1 6382
redis-cli -h 127.0.0.1 -p 6380 cluster meet 127.0.0.1 6383
redis-cli -h 127.0.0.1 -p 6380 cluster meet 127.0.0.1 6384
redis-cli -h 127.0.0.1 -p 6380 cluster meet 127.0.0.1 6385


redis-cli -h 127.0.0.1 -p 6380 cluster addslots $(seq 0 5000)
redis-cli -h 127.0.0.1 -p 6381 cluster addslots $(seq 5001 11000)
redis-cli -h 127.0.0.1 -p 6382 cluster addslots $(seq 11001 16383)

redis-cli -h 127.0.0.1 -p 6383 cluster replicate $(redis-cli -h 127.0.0.1 -p 6383 cluster nodes |grep '0-5000' |awk '{print $1}')
redis-cli -h 127.0.0.1 -p 6384 cluster replicate $(redis-cli -h 127.0.0.1 -p 6384 cluster nodes |grep '5001-11000' |awk '{print $1}')
redis-cli -h 127.0.0.1 -p 6385 cluster replicate $(redis-cli -h 127.0.0.1 -p 6385 cluster nodes |grep '11001-16383' |awk '{print $1}')

# because the config file cannot be omitted so I just make a simply config with "bind 6666" in it
corvus -b 6666 -c default -t 4 -n 127.0.0.1:6380,127.0.0.1:6381 a.conf

jasonjoo2010 · 2020-01-13T08:16:54Z

maybe because of redis pipeline？

corvus/src/client.c

Line 198 in df06ef7

// wait for all cmds in cmd_queue to be done

For example, if Corvus reveive 3 command in the pipeline. Then Corvus will wait for the response for all the three command. Then send back to client.

We had did some test, if we do not use this command queue to wait all the response. This will improve performance a lot.

So could we change the code to remove the wait command queue and reply to clinet immediately?
or how can we contribute?
@jasonjoo2010

And more as you mentioned in my opinion pipeline is not a correct scenario to corvus as publish/subscribe. We have implemented a branch supporting pub/sub but now we use it in client side directly (Cluster jedis client for example). corvus is efficient for those legacy projects but if you can support redis cluster directly just connect directly.

For HA purpose we have an automatic publishing agent which operating backends in lvs/dpvs[1] and the smart clients in application can always keep synchronized with the cluster.

If you are in any kind of cloud environment you can develop the agent operating the load balancer using cloud api to make the VIP updated.

[1] http://github.com/iqiyi/dpvs

savagecm · 2020-01-13T08:18:27Z

with 6 nodes which the original author mentioned and only ONE corvus pro

I mean that when you using benchmark. It send message very fast, then it will trigger redis-pipeline feature. Benchmark will send serveral RSP message in one time. Corvus will wait until all the response arrive.

jasonjoo2010 · 2020-01-13T08:27:23Z

with 6 nodes which the original author mentioned and only ONE corvus pro

I mean that when you using benchmark. It send message very fast, then it will trigger redis-pipeline feature. Benchmark will send serveral RSP message in one time. Corvus will wait until all the response arrive.

Nope

 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).

By default benchmark will not enable pipeline feature.

jasonjoo2010 · 2020-01-13T08:29:00Z

And what I agree with you is that it's better connect to redis cluster directly when we want use this feature.

savagecm · 2020-01-13T08:34:40Z

with 6 nodes which the original author mentioned and only ONE corvus pro

I mean that when you using benchmark. It send message very fast, then it will trigger redis-pipeline feature. Benchmark will send serveral RSP message in one time. Corvus will wait until all the response arrive.

Nope
 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).
By default benchmark will not enable pipeline feature.

I just want to mention that - the code I mentioned will hold severial request and will enlarge the latency. Corvus will get better performance after we change this logic.

I am asking the maintainer if they can change or I can contribute. You can do the same thing to test if you can get better performance

jasonjoo2010 · 2020-01-13T08:40:49Z

with 6 nodes which the original author mentioned and only ONE corvus pro

I mean that when you using benchmark. It send message very fast, then it will trigger redis-pipeline feature. Benchmark will send serveral RSP message in one time. Corvus will wait until all the response arrive.

Nope
 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).
By default benchmark will not enable pipeline feature.
I just want to mention that - the code I mentioned will hold severial request and will enlarge the latency. Corvus will get better performance after we change this logic.

I am asking the maintainer if they can change or I can contribute. You can do the same thing to test if you can get better performance

Yeah you're right here. It would be great if you can shot it a patch on it.

savagecm · 2020-01-13T08:46:17Z

with 6 nodes which the original author mentioned and only ONE corvus pro

I mean that when you using benchmark. It send message very fast, then it will trigger redis-pipeline feature. Benchmark will send serveral RSP message in one time. Corvus will wait until all the response arrive.

Nope
 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).
By default benchmark will not enable pipeline feature.
I just want to mention that - the code I mentioned will hold severial request and will enlarge the latency. Corvus will get better performance after we change this logic.
I am asking the maintainer if they can change or I can contribute. You can do the same thing to test if you can get better performance
Yeah you're right here. It would be great if you can shot it a patch on it.

Our change may introduce bugs. Maybe we can ask if the maintainer could fix that.

Or maybe they have their reason to do so......

Btw who is the acitve maintainer of this rep now?

jasonjoo2010 · 2020-01-13T08:51:48Z

with 6 nodes which the original author mentioned and only ONE corvus pro

I mean that when you using benchmark. It send message very fast, then it will trigger redis-pipeline feature. Benchmark will send serveral RSP message in one time. Corvus will wait until all the response arrive.

Nope
 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).
By default benchmark will not enable pipeline feature.
I just want to mention that - the code I mentioned will hold severial request and will enlarge the latency. Corvus will get better performance after we change this logic.
I am asking the maintainer if they can change or I can contribute. You can do the same thing to test if you can get better performance
Yeah you're right here. It would be great if you can shot it a patch on it.
Our change may introduce bugs. Maybe we can ask if the maintainer could fix that.

Or maybe they have their reason to do so......

Btw who is the acitve maintainer of this rep now?

So far as I know they have turned to use semi-client-side layer like sidecar in containers environment. Something more like connecting to cluster directly. We can cue @tevino

tevino · 2020-05-27T11:03:11Z

Yes, it's here, take a look.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在redis_cluster集群基础上安装了corvus后为啥tps降低了 #150

在redis_cluster集群基础上安装了corvus后为啥tps降低了 #150

sccotJiang commented Nov 6, 2019 •

edited

whk-isat commented Dec 6, 2019

sccotJiang commented Dec 12, 2019

jasonjoo2010 commented Dec 30, 2019 •

edited

savagecm commented Jan 13, 2020 •

edited

jasonjoo2010 commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

savagecm commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

savagecm commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

savagecm commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

tevino commented May 27, 2020

在redis_cluster集群基础上安装了corvus后为啥tps降低了 #150

在redis_cluster集群基础上安装了corvus后为啥tps降低了 #150

Comments

sccotJiang commented Nov 6, 2019 • edited

whk-isat commented Dec 6, 2019

sccotJiang commented Dec 12, 2019

jasonjoo2010 commented Dec 30, 2019 • edited

Ping值

带宽

多例

savagecm commented Jan 13, 2020 • edited

jasonjoo2010 commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

savagecm commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

savagecm commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

savagecm commented Jan 13, 2020

jasonjoo2010 commented Jan 13, 2020

tevino commented May 27, 2020

sccotJiang commented Nov 6, 2019 •

edited

jasonjoo2010 commented Dec 30, 2019 •

edited

savagecm commented Jan 13, 2020 •

edited