SN Server performance #164

lurenpluto · 2023-04-03T08:20:49Z

In our system, the SN server plays a critical role in the BDT protocol, enabling NAT peer connections and partially handling the service discovery mechanism.

For clients using the cyfs-stack (such as gateways and cyfs-runtime), the SN server can be customized or the built-in system SN server can be used through configuration. The current built-in SN servers is as follows:

For the beta version, there are three SN servers.
For the nightly version, there are two SN servers.

The server is dynamically chosen based on the client's DeviceID area.

The implementation code for the SN server can be found at:
https://github.com/buckyos/CYFS/tree/main/src/component/cyfs-bdt/src/sn

In situations where the overall system capacity is large, the performance of the SN server may become a critical bottleneck. This might necessitate a series of mechanisms for evaluating and improving its performance.

We now need to gather some performance data on the current SN server, conduct stress tests, and use the results to optimize and improve its performance in the future. Is there anyone who can help with conducting these performance tests?

lurenpluto · 2023-04-03T08:28:00Z

Some suggestions: Based on the existing BDT SN-related components, you can develop a single-process multi-SN client to simulate multiple BDT protocol stacks using the SN server. The primary focus should be on covering the core logic of SN ping and SN call operations.

By developing a single-process multi-SN client and running stress tests, we can gain valuable insights into the performance of the SN server and identify opportunities for improvement.

jing-git · 2023-04-03T10:05:06Z

Hello, I would like to use the bdt-debugger-daemon to develop a performance analysis tool. The general steps are as follows:

Set up an SN server on the public network, configure it with dual-core, 4GB memory, and Ubuntu operating system, install perf and bmon tools to monitor the resource usage of SN miner during stress testing. The final indicators for stress testing are: CPU usage, bandwidth, memory usage, and the number of requests processed per second.
Add stress testing functionality for SN to bdt-debugger-daemon, which can concurrently perform online operations and call requests to an SN. Call requests are triggered by sending and receiving datagram packets between two devices located behind NAT.
Stress testing for SN online operations
Prepare two NAT environments: NAT1 and NAT2, each running two devices for stress testing.
3.1 Perform ping stress testing under NAT1 to obtain performance metrics.
3.2 Perform stress testing simultaneously under NAT1 and NAT2.
3.3 Compare the data from the two groups to obtain the performance indicators for SN online operations.
Stress testing for SN call requests
Under NAT1, open two machines PC1 and PC2, respectively run bdt-debuger-deamon. PC1 sends datagram data of 5 bytes to PC2 concurrently. Because they are behind NAT, they will use SN to establish tunnel and perform call punching. This will generate performance indicators for call requests.

lurenpluto · 2023-04-03T13:27:51Z

The solution should be feasible!

If you need to test, you can first choose an SN in a nightly environment, such as simulating 1k or 10k peers, and check the server-related load and response delay curve in time.

jing-git · 2023-04-04T03:55:54Z

Ok, i will simulate 10k peers to stress testing for SN online operations firstly.

jing-git · 2023-04-13T08:09:51Z

The performance test results for SN are as follows:

Server hardware configuration of sn-miner:

CPU: Dual-core 2.5GHz
Memory: 4GB
OS: Ubuntu 18.04
Downstream bandwidth: 100Mb/s
Upstream bandwidth: 50Mb/s

1. SN Ping Request (SN online)

Client testing environment:
Devices from two LANs perform SN online operation simultaneously.
LAN1: Simulate 8000 devices;
LAN2: Simulate 2000 devices;
Concurrently deploy 10,000 devices, each device performing 500 rounds of SN Ping for a total of 5,000,000 times, then monitor the resource usage of sn-miner during its execution and record its profile. After the stress test is completed, count the processing quantity per second.

Performance results of sn-miner:
CPU usage: >91%
Memory usage: 1%
Downstream bandwidth usage: 3MB/s
Upstream bandwidth: 860KB/s
QPS: 2200/s
CPU resource consumption statistics:
66.36% sn-miner-rust sn-miner-rust [.] num_bigint_dig::biguint::monty::montgomery
1.67% sn-miner-rust sn-miner-rust [.] <smallvec::SmallVec as core::iter::traits::collect::Extend<::Item>>::extend
1.53% sn-miner-rust [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.46% sn-miner-rust libc-2.27.so [.] cfree
1.37% sn-miner-rust [kernel.kallsyms] [k] finish_task_switch
0.93% sn-miner-rust [kernel.kallsyms] [k] exit_to_usermode_loop
0.86% sn-miner-rust libc-2.27.so [.] malloc
0.72% async-std/runti [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.71% sn-miner-rust sn-miner-rust [.] num_bigint_dig::algorithms::div::div_rem
...

The main bottleneck is the CPU from above data.
SNClient performs a keepalive with SNMiner every 25 seconds, so a dual-core processor can support a maximum of 55000 devices online simultaneously.

2. Call Request (P2P Punching)

250 devices simultaneously perform call punching connections in parallel.
Trigger the called process by sending and receiving 5 bytes of data through datagram, each device loop sends 500 times, for a total of 125,000 times.

Performance results of sn-miner:
CPU usage: 47.5%
Memory usage: 4.6%
Downstream bandwidth usage: 3MB/s
Upstream bandwidth: 4MB/s
QPS: 3000/s
CPU resource consumption statistics:
5.72% sn-miner-rust [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
3.35% sn-miner-rust [kernel.kallsyms] [k] finish_task_switch
3.33% async-std/runti [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.50% sn-miner-rust libc-2.27.so [.] 0x000000000018ea5f
2.07% sn-miner-rust libc-2.27.so [.] 0x000000000018ee6d
2.02% sn-miner-rust libc-2.27.so [.] cfree
1.79% sn-miner-rust libc-2.27.so [.] malloc
1.73% sn-miner-rust sn-miner-rust [.] num_bigint_dig::biguint::monty::montgomery
1.70% async-std/runti [kernel.kallsyms] [k] finish_task_switch
1.62% sn-miner-rust libc-2.27.so [.] syscall
1.53% Timer thread sn-miner-rust [.] alloc::collections::binary_heap::BinaryHeap::pop
1.42% Timer thread [kernel.kallsyms] [k] finish_task_switch
1.36% sn-miner-rust [kernel.kallsyms] [k] do_syscall_64
1.33% sn-miner-rust [kernel.kallsyms] [k] exit_to_usermode_loop
1.28% async-std/runti libc-2.27.so [.] syscall
1.17% Timer thread libc-2.27.so [.] syscall
1.01% sn-miner-rust sn-miner-rust [.] sha2::sha256_utils::compress256
0.98% async-io [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore

Compared to ping, the CPU consumption is lower, but there is a higher demand for upstream bandwidth.

lurenpluto · 2023-04-13T09:16:19Z

Very detailed datas!

I have a few questions here:

What is the ping interval for each simulated device? According to the current configuration in BDT, the default interval is 25s, but what interval did you use during the stress test? In this case, it is not necessary to use the default interval.
If there is no interval in the ping loop during testing, the quantification will also be inaccurate. It is better to have a clear time interval.

So here are a few suggestions:

1. The Ping interval should be a important and configurable value

This value is also a core factor in controlling the scale of the stress-tested devices.
For example, if the interval is 1s, compared to the default value of 25s, one device can be equivalent to 25 devices.

2. Pay attention to the device-side data during the stress test

This mainly including:

Total latency of ping resp/call resp; as the pressure increases, this time should become longer, but there should be some comparison data and curves.
Error statistics for Ping and call to see the processing capacity of the SN server under extreme conditions, which can also reflect the stability of the device ping and call logic.

jing-git · 2023-04-13T12:24:37Z

No interval, the stress testing process for SN Ping is as follows:

loop {
	bencher.stack.sn_client().ping().wait_online()
	...
	bencher.stack.reset_sn_list(sns.clone());
}

After each ping, the SN list in the stack is reset, which means that ping requests are constantly being sent to prevent interference from SN keepalive and connection status.

"Yes, the response time of ping/call from the client side and the error handling ability of SN are also important. These indicators will be added in the subsequent stress testing.

jing-git · 2023-04-20T04:05:02Z

The stress test for this phase has been completed, The next steps involve tracking two additional issues ( #230 #229 ) to be resolved.

lurenpluto added task This is a task BDT bucky data transfer protocol Performance about performance issues labels Apr 3, 2023

lurenpluto added this to CYFS-Stack & Services Apr 3, 2023

lurenpluto moved this to 💬To Discuss in CYFS-Stack & Services Apr 3, 2023

lurenpluto added this to the Performance Optimization and Bug Fix Release milestone Apr 3, 2023

lurenpluto assigned jing-git Apr 3, 2023

lurenpluto moved this from 💬To Discuss to 📝Todo in CYFS-Stack & Services Apr 3, 2023

lurenpluto added the SN SN Server label Apr 3, 2023

lurenpluto moved this from 📝Todo to 🚧In Progress in CYFS-Stack & Services Apr 13, 2023

This was referenced Apr 20, 2023

Optimize the SN stress test client #229

Open

To deal with sn-miner's OOM #230

Open

jing-git closed this as completed Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SN Server performance #164

SN Server performance #164

lurenpluto commented Apr 3, 2023

lurenpluto commented Apr 3, 2023

jing-git commented Apr 3, 2023 •

edited

Loading

lurenpluto commented Apr 3, 2023

jing-git commented Apr 4, 2023 •

edited

Loading

jing-git commented Apr 13, 2023

lurenpluto commented Apr 13, 2023

jing-git commented Apr 13, 2023

jing-git commented Apr 20, 2023

SN Server performance #164

SN Server performance #164

Comments

lurenpluto commented Apr 3, 2023

lurenpluto commented Apr 3, 2023

jing-git commented Apr 3, 2023 • edited Loading

lurenpluto commented Apr 3, 2023

jing-git commented Apr 4, 2023 • edited Loading

jing-git commented Apr 13, 2023

Server hardware configuration of sn-miner:

1. SN Ping Request (SN online)

2. Call Request (P2P Punching)

lurenpluto commented Apr 13, 2023

1. The Ping interval should be a important and configurable value

2. Pay attention to the device-side data during the stress test

jing-git commented Apr 13, 2023

jing-git commented Apr 20, 2023

jing-git commented Apr 3, 2023 •

edited

Loading

jing-git commented Apr 4, 2023 •

edited

Loading