-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There are some cases where OT (or Matter) loses track of matter services #9909
Comments
If a service is not present in the list of services you get from OT Once a service is successfully added using |
ok, so issue points to chip stack likely, and this:
if it fails on the first call, the rest are not tried. Otherwise it can fail anywhere through the list of fabrics. |
I suspect there is still an issue in OT side either as a bug, or due to inappropriate use by us and/or matter. Matter seems to remove all services and readd the ones it cares about. As this issue presents on the BR side logs from Apple, only the matterc service is actually removed by thread client, the remainder timeout due to lease. This suggests that perhaps if multiple services are removed all at once, OT only actively removes the last one from the SRP server? We've otherwise avoided this behaviour by combining our own registration with matter's which seems to avoid the original failure to Advertise a service in the for loop.
I have sysdiagnose logs for any parties interested in the above analysis (file too large to upload) |
A call to |
Code snippet from CHIP:
|
Anyway, the issue has been logged to CHIP, so this could probably be closed in the absence of stronger evidence/steps to reproduce. |
We've observed cases where matter mDNS services disappear unexpectedly
To Reproduce Information to reproduce the behavior, including:
We've been unable to capture this reproducibility high enough to observe it on a device with debug logs. We've indirectly observed logs from HPM sysdiagnose with the Home Network Diagnostics profile, which includes SRP calls.
Based on observation, we see some or all matter services disappearing ~2 hours after boot. Matter flow starts with advertising all operational nodes and a matterc record for a period of 3 minutes.
Independently we have our own record type (ltpdu) that needs to be advertised always. Because of current deficiencies in Matter SDK (project-chip/connectedhomeip#32507), we have a timer that will check every 2s to see if our service is still there - we iterate over known OT services
If we detect that our service is missing, we will add it immediately:
Our working theory is that race conditions can result in some or all of matter services not getting re-added when matterc is removed. Observationally, these services are not actively removed by the device, they simply expire later at the srp server (e.g. not refreshed).
Its unclear to us if this indicates a deficiency in OT code, a deficiency how we are using it, or otherwise a deficiency in Matter SDK.
We are already refactoring our approach to have our own service added to OT in the same function call that Matter uses to add its own to potentially address our hypothesis. But it would be good to get some understanding from the OT community whether this points to other hiding OT issues with srp behaviour.
The text was updated successfully, but these errors were encountered: