-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To deal with sn-miner's OOM #230
Comments
Regarding the OOM of sn in stress test, it can be analyzed from the following perspectives: 1. Theoretical memory occupation valueThis should be done from the internal implementation of SN, looking at the cache design of the ping/call related logic, the theoretical value of memory occupation under a specific amount of data, such as the following key indicators
The theoretical memory usage in the above determined scenario. Then, based on this, the upper limit of the devices that the sn service can host under a specific server memory size can be inferred as a key indicator of the sn server 2. Whether there is memory leakageIt can be seen from several perspectives
|
The biggest difference between I think we should review the code. |
If the sn server caches a package for each peer's call, then in a stress test environment, multiple independent peers initiate a large number of calls, which may cause the sn server to backlog a lot of call packages So this may need to add some statistical logs on the sn server side, such as printing the total number of call packages and the total size, to assist in the stress test |
Regarding SN performance optimization, we can use this issue as an entry point to have a systematic development of SN server, and also facilitate other people to understand the logic of our SN server We can start from the analysis of the existing code of SN and follow the following steps
|
There are several sub-issues here, and I've opened some discussions to track them: |
During the sn-call stress test (#164 ), simulating 1200 client requests each time, i was found that sn-miner exited probabilistically after multiple attempts. By checking the process status using 'top' and reviewing the system logs, i found that this was caused by sn-miner's OOM (Out Of Memory) error:
kernel: 0 pages HighMem/MovableOnly
kernel: 38158 pages reserved
kernel: 0 pages cma reserved
kernel: 0 pages hwpoisoned
kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
kernel: [ 253] 0 253 126024 3216 1044480 0 0 systemd-journal
kernel: [ 281] 0 281 8599 413 114688 0 -1000 systemd-udevd
kernel: [ 496] 100 496 20010 168 180224 0 0 systemd-network
kernel: [ 497] 101 497 17656 162 184320 0 0 systemd-resolve
kernel: [ 516] 102 516 65761 486 167936 0 0 rsyslogd
kernel: [ 518] 0 518 7085 52 102400 0 0 atd
kernel: [15962] 0 15962 5656 388 94208 0 0 bash
kernel: [16317] 0 16317 1985850 854368 14987264 0 0 sn-miner-rust
kernel: [16359] 0 16359 10381 105 122880 0 0 top
kernel: Out of memory: Kill process 16317 (sn-miner-rust) score 855 or sacrifice child
kernel: Killed process 16317 (sn-miner-rust) total-vm:7943400kB, anon-rss:3417472kB, file-rss:0kB, shmem-rss:0kB
kernel: oom_reaper: reaped process 16317 (sn-miner-rust), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
So i want to identify the cause of the oom issue and fix it :)
The text was updated successfully, but these errors were encountered: