Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throws KNXTimeoutException due to missing confirmation although the confirmation was received #120

Open
dallmair opened this issue May 1, 2023 · 2 comments

Comments

@dallmair
Copy link

dallmair commented May 1, 2023

We're not entirely sure yet and still digging through the code (both calimero and ours), but it seems there is a synchronization issue in calimero-core 2.5.1. It happens only occasionally and definitely depends on timing.

Please see attached log file for a DEBUG log in HTML format that includes all messages that led up to the problem in an instance last week. We use ManagementClientImpl over a Tunneling connection via a certified IP interface to a DIY device. After a few successful message exchanges, the log contains:

  1. 00:17:20.449 calimero sends T_ACK message to the device ✔️
  2. 00:17:20.453 calimero sends T_DATA_CONNECTED message to the device ✔️
  3. 00:17:20.470 + 00:17:20.471 calimero receives confirmation of the T_ACK message ✔️
  4. 00:17:20.514 calimero receives confirmation of the T_DATA_CONNECTED message ✔️
  5. 00:17:20.526 calimero receives T_ACK in response of message sent in step 2 ✔️
  6. 00:17:20.549 calimero receives T_DATA_CONNECTED (some response message from device) ✔️
  7. 00:17:23.549 device repeats the message from step 6 as it did not get an answer within 3 seconds ⚠️ Should not have happened, but just a consequence of the underlying issue, see next
  8. 00:17:24.467 calimero throws a KNXTimeoutException saying "no confirmation reply received for" the message sent in step 2 ❌ This is strange as it logged reception of the confirmation right there in step 4.

To sum it up, calimero writes a log entry about having successfully received the confirmation, but then throws an exception because it thinks it did not receive the confirmation.

Or are we reading the log file incorrectly, missing something else, or just don't get it...? It would be great if you could take a look and help us pinpoint the root cause. Many thanks in advance!

updater-2023-04-21_00-17-15.501-short.zip

@dallmair
Copy link
Author

dallmair commented May 2, 2023

The only thing that looks a bit suspicious to me is the implementation of ClientConnection::doExtraBlockingModes: waitForStateChange already returns whether the state change happened or not, but doExtraBlockingModes ignores that result and compares the internalState field again. That seems odd to me, but I have no idea whether this is actually in the relevant code path for the issue reported here.

@bmalinowsky
Copy link
Collaborator

Thank you for the log file. I'll try to find out what's wrong.

  • There are several duplicate confirmations in the log, e.g., 00:17:18.807 and 00:17:18.840.
  • The logic in KNXnet/IP tunneling for determining a successful .con is currently after the notification to the link layer. Hence, a .con is always reported (logged). This has its benefits (there is a TODO to move that, because I know that can be confusing), but is not helpful in your case: a L2 log entry does not mean that tunneling moved on from CEMI_CON_PENDING. I think that's the culprit.
    In essence, the relevant part whether a .con is accepted or not starts here:
    if (ldata != null && internalState == CEMI_CON_PENDING) {
  • For instance, at
    final int sendCount = ldata.getHopCount() - 1;

    there is a workaround for eibd, but there is currently no other log output for other conditions (i.e., if the array comparison of recv and sent fails completely).

I don't know how easy it is for you to run with changed source code, but setting the log output to TRACE, and adding some log output, e.g., array mismatch, would help. I think the .con is currently not accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants