-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak in ONVIF #72544
Comments
onvif documentation |
Hey there @hunterjm, mind taking a look at this issue as it has been labeled with an integration ( |
Left the test system running longer, just to confirm. See updated memory graph: From looking at |
It looks like weakrefs building up
|
|
@shbatm I expect the leak is here as each new retry creates a new |
Maybe related pyca/pyopenssl#1120 |
It would be good to use |
It would be nice if Here is the suggestion from the
|
Just restarted my test VM, I will dump the objects later today. |
After 2 hours: |
My first implementation for moving to We do cache SOAP bindings to be re-used if the same service gets created more than once, but it does not apply to PullPointSubscription services because we get new endpoints from the camera every time a subscription is requested.
From the logs, it looks like PullPointSubscription is failing frequently and the logic in Home Assistant attempts to restart the subscription automatically on error. This indeed does cause a new For ONVIF the easiest solution is probably to just add @shbatm - If you could verify the fix since you have cameras that can reproduce the issue, I'll open a version bump PR to Home Assistant. |
As a side note - interested in what cameras you have as while this will technically fix the memory leak - it won't fix the underlying issue of your event feed subscription either not working or getting killed very frequently. |
@hunterjm Happy to test; let me know when you make the change on the package repo and I can test it locally first. The cameras are all Hikvision of various models. The bad actor based on the MAC in the logs (and what I would expect) is the Doorbell (DS-HD1). It's Wifi and has a multitude of issues--I'm not surprised it has issues maintaining connection. I have to restart it every week and it's going in the trash as soon as I can pull POE to the Front Door ;) |
I'm also having some time syncronization issues with the cameras that I noticed when testing, it doesn't look like it's causing the issue, but just wanted to mention since it whines about possible authentication issues: The date/time on Back Yard (UTC) is '2022-05-27 11:34:01+00:00', which is different from the system '2022-05-27 10:34:04.033159+00:00', this could lead to authentication issues
The date/time on Front Sidewalk (UTC) is '2022-05-27 11:34:02+00:00', which is different from the system '2022-05-27 10:34:04.052574+00:00', this could lead to authentication issues
The date/time on Back Yard (UTC) is '2022-05-27 11:34:03+00:00', which is different from the system '2022-05-27 10:34:04.041229+00:00', this could lead to authentication issues
The date/time on Front Door Camera (UTC) is '2022-05-27 11:34:03+00:00', which is different from the system '2022-05-27 10:34:04.092536+00:00', this could lead to authentication issues Still troubleshooting my end with that. The NVR tries to syncronize with an NTP server and then push it to the cameras, but Hikvision has some firmwares that implemented daylight savings backwards (Start in Nov, Stop in March), which causes part of the issue. I've tried setting the NTP/DST on each camera to see if that gets the error to go away. |
The time sync issue is a known issue in HikVision firmware. They apply DST settings to their UTC representations as well in the ONVIF response... because why not? |
I just realized there might be some firewall rules interfering with the connections too. That camera is on its own SSID and VLAN with some very restrictive rules. If I find anything I'll let you know (sidebar: core-dns really needs to get rid of the Cloudflare hardcoded fallback--forgot to reenable the coredns fix addon and my router has been spammed 4 requests per second with blocked dns packets, once the noise gets filtered out I'll see if theres anything affecting the camera) |
@hunterjm, ran TCPDump+Wireshark on the camera this morning and found out:
tl;dr: Allowing camera to ping the router fixes the frequent connection issue and stops memory leak (but doesn't fix underlying root cause). I'll still be able to turn off the firewall rule and test your fix when you have it ready. This also means that unless someone else is having these same kind of intermittent connection issues on a very frequent basis, then this memory leak is going to be very slow. Still worth fixing if possible, but explains why it hasn't been seen (except for maybe back with #42390. |
@shbatm - That makes sense to me. Sorry for the delay, but it's been a long weekend. v1.2.1 of |
@hunterjm Testing today, will let you know. |
I have the EZVIZ DB1 doorbell camera (which is a Hikvision) and this has also been happening to me although my restarts are only once per day. Has this been fixed and included in a HA release yet? I searched the 2202.6.X release notes and don't see any mention of it. |
No, it doesn't look like the package has been bumped to the new version in Home Assistant yet. |
@hunterjm I just added a PR to make the bump. |
The problem
I believe the ONVIF integration has a memory leak. Not sure if it's in the core code or module dependencies.
I was experiencing frequent (1-3 times/day) forced restarts of Core by Supervisor, which I was able to trace back to OOM calls on the host vm killing the python process. Tried to narrow down the source as best as I could with help from @bdraco and it looks like we have it narrowed down to the ONVIF integration.
See memory sensor snapshot before and after disabling ONVIF on my main instance:
To further narrow it down, I started a new Home Assistant OS VM (Proxmox) with only the original ONVIF config entries, profiler, and system resources integrations enabled on top of the
default_config
. It looks like it is showing the same memory leak here.180-sec Py-Spy,
profiler.memory
andprofiler.start
outputs from test VM attached under Additional Info below. I can attach from main instance too if needed.What version of Home Assistant Core has the issue?
2022.5.5
What was the last working version of Home Assistant Core?
~2022.2.x
What type of installation are you running?
Home Assistant OS
Integration causing the issue
ONVIF
Link to integration documentation on our website
https://www.home-assistant.io/integrations/onvif
Diagnostics information
config_entry-onvif-cd09f9939eda6d950f4b80d7b1bac047.json.txt
config_entry-onvif-ef7f4c3b80516522e1dadc5448f7c8a5.json.txt
config_entry-onvif-97da2d4278d518eaa41f437b414cd8d1.json.txt
config_entry-onvif-295e1346abdabb24753cd7425e492c04.json.txt
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
Py-Spy and Profiler - Test Instance.zip
Py-Spy and Profiler - Main Instance.zip
Log File from Test VM prior to crash,
profiler.start_log_objects
running and logging as follows:home-assistant.log.1.txt
Potential historical issues related:
The text was updated successfully, but these errors were encountered: