Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU and memory keep growing #131

Open
Arzar opened this issue Apr 26, 2023 · 7 comments
Open

CPU and memory keep growing #131

Arzar opened this issue Apr 26, 2023 · 7 comments

Comments

@Arzar
Copy link

Arzar commented Apr 26, 2023

I'm using Jitsi autoscaler latest commit (acf86ac 2023/01/13) on ubuntu 22.04 arm64
Oracle cloud: Ampere A1 flex, 2CPU/4Gb mem

The CPU and memory used by the node process keep climbing slowly.
Starting one month ago from almost 0% CPU and 0% memory, it is now about 105% CU and 20% memory
$ top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
455849 ubuntu 20 0 1664176 792724 34056 R 102.7 19.8 23329:20 node

In our test environment there is just one JVB running and reporting statistics to the test autoscaler without any scale-up or down. In our production environment we do scale up and down but we restart the autoscaler every night because of this CPU/memory issue.

I profiled our test autoscaler with chrome://inspect and got this:

46492.8 ms43.13 % | 97726.0 ms90.67 % | (anonymous) status.js:82 |  
46492.8 ms43.13 % | 97726.0 ms90.67 % | ........listOnTimeout internal/timers.js:502 |  
46492.8 ms43.13 % | 97726.0 ms90.67 % | ...............processTimers internal/timers.js:482 |  
43874.9 ms40.71 % | 43874.9 ms40.71 % | (anonymous) status.js:96 |  
43874.9 ms40.71 % | 43874.9 ms40.71 % | ........(anonymous) status.js:94 |  
43874.9 ms40.71 % | 43874.9 ms40.71 % | ...............get stats status.js:93 |  
43874.9 ms40.71 % | 43874.9 ms40.71 % | ......................(anonymous) status.js:82 |  
43874.9 ms40.71 % | 43874.9 ms40.71 % | .............................listOnTimeout internal/timers.js:502 |  
43874.9 ms40.71 % | 43874.9 ms40.71 % | ..................................processTimers internal/timers.js:482 |  
4877.3 ms4.53 % | 5978.8 ms5.55 % | (anonymous) status.js:124 |  
4877.3 ms4.53 % | 5978.8 ms5.55 % | .......get stats status.js:93 |  
4877.3 ms4.53 % | 5978.8 ms5.55 % | ...............(anonymous) status.js:82 |  
4877.3 ms4.53 % | 5978.8 ms5.55 % | ......................listOnTimeout internal/timers.js:502 |  
4877.3 ms4.53 % | 5978.8 ms5.55 % | .............................processTimers internal/timers.js:502 |  

This suggest that the node process is drowning in timer management, but I'm not sure how to debug more. Does the Jitsi team encounter this issue? Do you have some suggestion how track the root cause?

@aaronkvanmeerten
Copy link
Member

Hi thanks for raising this issue. We have not been running the latest codebase in our production systems until recently, and now have experienced the same issue. We will be working to track it down and I'll try to report back here when we do figure it out/fix it. In the meantime if you have any more details about what you saw please let me know!

@aaronkvanmeerten
Copy link
Member

We have merged one commit which had updated the underlying oci sdk, which seems to have been the culprit. Once that was reverted, the current autoscaler docker image jitsi/autoscaler:0.0.19 is using this change, and seems to resolve the behavior.

@aaronkvanmeerten
Copy link
Member

aaronkvanmeerten commented Nov 6, 2023

In addition, I have a PR open that was used to build the docker image tagged jitsi/autoscaler:0.0.20 which includes library updates and the required code updates to match. #145

this one we haven't run except in a dev environment, and cannot speak to how much long-running CPU/memory it consumes but I'll report back here if it looks OK and gets merged.

@aaronkvanmeerten
Copy link
Member

I have merged the updated package dependencies for latest, and would suggest you try testing either latest master (0.0.20 in dockerhub) or the commit prior (0.0.19 in dockerhub) depending on your taste for the novel. I am promoting 0.0.19 to production in our systems now, and hope that you can weigh in eventually to let us know if one of these candidates solve your issue. Thanks again for your report and sorry it took so long to address it!

@aaronkvanmeerten
Copy link
Member

It seems that 0.0.20 continues to leak. The latest candidate in master is also in dockerhub as 0.0.22, with the older oci-sdk reverted but otherwise all other dependencies updated.
Please note that if you were scraping prometheus metrics from the autoscaler, this has moved to a new port as of 0.0.21

@Arzar
Copy link
Author

Arzar commented Nov 27, 2023

Thanks for the follow-up!
Our system is now in production so it's difficult to do any changes, but next time we do a major update I will try to increase the version of the jitsi-autoscaler.

@aaronkvanmeerten
Copy link
Member

I've confirmed that 0.0.22 does not show the leak, and 0.0.21 does. I have opened an issue with the oci typescript sdk project: oracle/oci-typescript-sdk#247 in case this is of interest to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants