Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SN Server performance #164

Closed
lurenpluto opened this issue Apr 3, 2023 · 8 comments
Closed

SN Server performance #164

lurenpluto opened this issue Apr 3, 2023 · 8 comments
Assignees
Labels
BDT bucky data transfer protocol Performance about performance issues SN SN Server task This is a task

Comments

@lurenpluto
Copy link
Member

In our system, the SN server plays a critical role in the BDT protocol, enabling NAT peer connections and partially handling the service discovery mechanism.

For clients using the cyfs-stack (such as gateways and cyfs-runtime), the SN server can be customized or the built-in system SN server can be used through configuration. The current built-in SN servers is as follows:

  • For the beta version, there are three SN servers.
  • For the nightly version, there are two SN servers.

The server is dynamically chosen based on the client's DeviceID area.

The implementation code for the SN server can be found at:
https://github.com/buckyos/CYFS/tree/main/src/component/cyfs-bdt/src/sn

In situations where the overall system capacity is large, the performance of the SN server may become a critical bottleneck. This might necessitate a series of mechanisms for evaluating and improving its performance.

We now need to gather some performance data on the current SN server, conduct stress tests, and use the results to optimize and improve its performance in the future. Is there anyone who can help with conducting these performance tests?

@lurenpluto lurenpluto added task This is a task BDT bucky data transfer protocol Performance about performance issues labels Apr 3, 2023
@lurenpluto
Copy link
Member Author

Some suggestions: Based on the existing BDT SN-related components, you can develop a single-process multi-SN client to simulate multiple BDT protocol stacks using the SN server. The primary focus should be on covering the core logic of SN ping and SN call operations.

By developing a single-process multi-SN client and running stress tests, we can gain valuable insights into the performance of the SN server and identify opportunities for improvement.

@jing-git
Copy link
Collaborator

jing-git commented Apr 3, 2023

Hello, I would like to use the bdt-debugger-daemon to develop a performance analysis tool. The general steps are as follows:

  1. Set up an SN server on the public network, configure it with dual-core, 4GB memory, and Ubuntu operating system, install perf and bmon tools to monitor the resource usage of SN miner during stress testing. The final indicators for stress testing are: CPU usage, bandwidth, memory usage, and the number of requests processed per second.

  2. Add stress testing functionality for SN to bdt-debugger-daemon, which can concurrently perform online operations and call requests to an SN. Call requests are triggered by sending and receiving datagram packets between two devices located behind NAT.

  3. Stress testing for SN online operations
    Prepare two NAT environments: NAT1 and NAT2, each running two devices for stress testing.
    3.1 Perform ping stress testing under NAT1 to obtain performance metrics.
    3.2 Perform stress testing simultaneously under NAT1 and NAT2.
    3.3 Compare the data from the two groups to obtain the performance indicators for SN online operations.

  4. Stress testing for SN call requests
    Under NAT1, open two machines PC1 and PC2, respectively run bdt-debuger-deamon. PC1 sends datagram data of 5 bytes to PC2 concurrently. Because they are behind NAT, they will use SN to establish tunnel and perform call punching. This will generate performance indicators for call requests.

@lurenpluto
Copy link
Member Author

The solution should be feasible!

If you need to test, you can first choose an SN in a nightly environment, such as simulating 1k or 10k peers, and check the server-related load and response delay curve in time.

@lurenpluto lurenpluto added the SN SN Server label Apr 3, 2023
@jing-git
Copy link
Collaborator

jing-git commented Apr 4, 2023

Ok, i will simulate 10k peers to stress testing for SN online operations firstly.

@jing-git
Copy link
Collaborator

The performance test results for SN are as follows:

Server hardware configuration of sn-miner:

CPU: Dual-core 2.5GHz
Memory: 4GB
OS: Ubuntu 18.04
Downstream bandwidth: 100Mb/s
Upstream bandwidth: 50Mb/s

1. SN Ping Request (SN online)

Client testing environment:
Devices from two LANs perform SN online operation simultaneously.
LAN1: Simulate 8000 devices;
LAN2: Simulate 2000 devices;
Concurrently deploy 10,000 devices, each device performing 500 rounds of SN Ping for a total of 5,000,000 times, then monitor the resource usage of sn-miner during its execution and record its profile. After the stress test is completed, count the processing quantity per second.

Performance results of sn-miner:
CPU usage: >91%
Memory usage: 1%
Downstream bandwidth usage: 3MB/s
Upstream bandwidth: 860KB/s
QPS: 2200/s
CPU resource consumption statistics:
66.36% sn-miner-rust sn-miner-rust [.] num_bigint_dig::biguint::monty::montgomery
1.67% sn-miner-rust sn-miner-rust [.] <smallvec::SmallVec as core::iter::traits::collect::Extend<::Item>>::extend
1.53% sn-miner-rust [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.46% sn-miner-rust libc-2.27.so [.] cfree
1.37% sn-miner-rust [kernel.kallsyms] [k] finish_task_switch
0.93% sn-miner-rust [kernel.kallsyms] [k] exit_to_usermode_loop
0.86% sn-miner-rust libc-2.27.so [.] malloc
0.72% async-std/runti [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.71% sn-miner-rust sn-miner-rust [.] num_bigint_dig::algorithms::div::div_rem
...

The main bottleneck is the CPU from above data.
SNClient performs a keepalive with SNMiner every 25 seconds, so a dual-core processor can support a maximum of 55000 devices online simultaneously.

2. Call Request (P2P Punching)

250 devices simultaneously perform call punching connections in parallel.
Trigger the called process by sending and receiving 5 bytes of data through datagram, each device loop sends 500 times, for a total of 125,000 times.

Performance results of sn-miner:
CPU usage: 47.5%
Memory usage: 4.6%
Downstream bandwidth usage: 3MB/s
Upstream bandwidth: 4MB/s
QPS: 3000/s
CPU resource consumption statistics:
5.72% sn-miner-rust [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
3.35% sn-miner-rust [kernel.kallsyms] [k] finish_task_switch
3.33% async-std/runti [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.50% sn-miner-rust libc-2.27.so [.] 0x000000000018ea5f
2.07% sn-miner-rust libc-2.27.so [.] 0x000000000018ee6d
2.02% sn-miner-rust libc-2.27.so [.] cfree
1.79% sn-miner-rust libc-2.27.so [.] malloc
1.73% sn-miner-rust sn-miner-rust [.] num_bigint_dig::biguint::monty::montgomery
1.70% async-std/runti [kernel.kallsyms] [k] finish_task_switch
1.62% sn-miner-rust libc-2.27.so [.] syscall
1.53% Timer thread sn-miner-rust [.] alloc::collections::binary_heap::BinaryHeap::pop
1.42% Timer thread [kernel.kallsyms] [k] finish_task_switch
1.36% sn-miner-rust [kernel.kallsyms] [k] do_syscall_64
1.33% sn-miner-rust [kernel.kallsyms] [k] exit_to_usermode_loop
1.28% async-std/runti libc-2.27.so [.] syscall
1.17% Timer thread libc-2.27.so [.] syscall
1.01% sn-miner-rust sn-miner-rust [.] sha2::sha256_utils::compress256
0.98% async-io [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore

Compared to ping, the CPU consumption is lower, but there is a higher demand for upstream bandwidth.

@lurenpluto
Copy link
Member Author

Very detailed datas!

I have a few questions here:

  1. What is the ping interval for each simulated device? According to the current configuration in BDT, the default interval is 25s, but what interval did you use during the stress test? In this case, it is not necessary to use the default interval.

  2. If there is no interval in the ping loop during testing, the quantification will also be inaccurate. It is better to have a clear time interval.

So here are a few suggestions:

1. The Ping interval should be a important and configurable value

This value is also a core factor in controlling the scale of the stress-tested devices.
For example, if the interval is 1s, compared to the default value of 25s, one device can be equivalent to 25 devices.

2. Pay attention to the device-side data during the stress test

This mainly including:

  • Total latency of ping resp/call resp; as the pressure increases, this time should become longer, but there should be some comparison data and curves.
  • Error statistics for Ping and call to see the processing capacity of the SN server under extreme conditions, which can also reflect the stability of the device ping and call logic.

@jing-git
Copy link
Collaborator

No interval, the stress testing process for SN Ping is as follows:

loop {
	bencher.stack.sn_client().ping().wait_online()
	...
	bencher.stack.reset_sn_list(sns.clone());
}

After each ping, the SN list in the stack is reset, which means that ping requests are constantly being sent to prevent interference from SN keepalive and connection status.

"Yes, the response time of ping/call from the client side and the error handling ability of SN are also important. These indicators will be added in the subsequent stress testing.

@jing-git
Copy link
Collaborator

The stress test for this phase has been completed, The next steps involve tracking two additional issues ( #230 #229 ) to be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BDT bucky data transfer protocol Performance about performance issues SN SN Server task This is a task
Projects
Status: 🚧In Progress
Development

No branches or pull requests

2 participants