-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] BSB protocol device discovery does not work -- [FEATURE REQUEST] MQTT topic structure without category in path (optional?) #680
Comments
P.S.: I was about to post my configuration of a HASS "climate" device that supports swithching on/off (where it handles to restore the previous state of system like "Reduziert", "Komfort", "Auto") when switching it back on. Background: Unfortunately HASS's climate have only on/of setting and a separate toggle for the "mode", while on BSB-LAN heaters you only have one toggle, where "Frostschutz" means off. So the on/off switch should switch to "Frostschutz" and when turning back on it should revert to the previous mode (Komfort/Reduced/Auto). -- Unfortunately the config is not easy to post on forums with the new topic structure because the category IDs are different for every system. |
That's the thing if you introduce new features - some people want it this way, some people want it that way. The argument to filter MQTT-subscribed parameters based on categories rather than having to enter parameters one-by-one was the reason I introduced that feature. To me, this is a more frequent use case than re-using configurations across different systems. If the systems are really the same, the category IDs will also be the same. If they are not, I wouldn't want to encourage people re-using configurations in the first place as you may or may not know what you are doing. And the "may nots" are the ones who cause support-time ;). If it is any consolation, the category ID is ignored when setting parameters, so you can write to BSB-LAN/1/0/700 in order to write to parameter 700 on device ID 1. |
I don't agree with the "filtering". If you want to filter you filter by parameter id. They are all pre-structured. You know that 7xx is the Heizkreis 1, so its easy to filter them by path in MQTT explorers. I know almost all of my parameter IDs (I knew them already before using BSB-LAN). I know from the code that while setting the values it does not matter, but the bigger problem is setting up entities where you wait for changes on "/status", they need to have the correct category ID. |
Well, heating circuit 1 is more than just parameter numbers 7xx, they go up to 900. And if you know the parameter IDs, you will probably also know the order of the categories and thus also the category IDs (at least in systems with just one heating system). If you are interested in the parameters of HC1, it's of course much faster to subscribe just to BSB-LAN/0/XX/ instead of subscribing to each parameter manually. User-readable strings would also not solve the problem of portability, because they would be localized. In any case, it's a one-time task, and since reading/monitoring - i.e. subscribing - parameters are the majority compared to those that you want to actively change regularly (maybe a handful?), the advantages of the present model are clear to me. But I'll leave this issue open for a while. Let's see if there are more users who feel that this is a step back. |
Ok, since it only affects two lines in code, I have now added the NO_MQTT_HIERARCHY compile option, which you can set using |
OK thanks for this. Actually this setting only helps for the I will try to update my code on the weekend and extract the category ID. Maybe you can subscribe to topics like "/BSB-LAN/0/+/712" instead (not sure if that works with HASS), when writing you could use "0" as suggested before.. |
See https://www.hivemq.com/blog/mqtt-essentials-part-5-mqtt-topics-best-practices/ for how to use wildcards. |
Looks like you can use topics for subscribing in HASS: https://community.home-assistant.io/t/mqtt-topic-wildcard/389382/3 This makes it easy to ignore the category when subscribing for updates. For setting/informing you can use "0" and for polling the topic ID is enough. |
I missed one instance, so now it should have a consistent structure without the category ID. Can you test again? |
Yes and no. For subscribing wildcards are a good idea, but for setting values you still need a special case. But as "0" as category works for setting values I am fine with that limitation. |
I will give both variants a try later, I am a bit busy at moment. |
So then I could/should revert the changes? If it means that you can create relocatable templates with the way you just described, then I'd rather remove it. If you'd still need the "flat" hierarchy, I'll keep it, with the risk that people will try it out and clog their MQTT topic hierarchy ;)... |
Ok, thanks! |
Hi, I tried to test it but after upgrading (before, all was working correct), it is unable to detect the BSB. I cleared the EPROM manually, still no success. It seems to get the correct device/type and variant, but it reports 0 and 0 for both. Therefore I can't get it online at all.
This repeats forever. Can you help. Unfortunately theres no way anymore to set family and variant anymore. It shows correct in first line: 97/100 but then reports 0/0
Uwe |
I found the issue. The following line breaks everything in the logic. After disabling it, all works: Line 4057 in 2796578
This line of code changes the bus type for no reason to LBP which then breaks. On Web interface it also shows LBP on the config page. So basically I got the master version running, now trying to configure MQTT! Should I open another issue about the logic problem setting to bus type LBP on discovery of bus devices? |
There are more problems in the code. It boots up successfully, but every request for a parameter (via auto-logging or MQTT) leads to:
I think I will revert to latest published version.... |
OK, released version 4.1 boots and works correct. As it did not have the newest MQTT topic structure (with status,....), I then tried several commits and found out that e66083d works for me. If you want to fix the issues with discovery and startup (and possibly those where the query fails), you may contact me to get more debug information. It looks like the new code somehow mixes up device numbers and serial numbers in addition to suddenly setting the BUS type to PPS after scanning for devices. The newest version with the bug (after it started with commenting out the code as described above) reported device ID 128 in MQTT messages, while the working e66083d reports ID 0. Some issues I also found while upgrading: If I did not delete all retained messages with I will now try to use commit e66083d with wildcard topics in HASS. |
Hi,
Please note for the on/off button it uses the last known state (another entity) which is not "Frostschutz". Basically I think you can revert the two changesets. With wildcards it is not needed. The topics are still a bit async but much better than explicitly giving the category ids. For getting a value it uses wildcard, for setting the fixed category=0:
The cool thing is: For getting/setting temperature, no template is needed, as the payload can be directly sent. I will only improve my config to have a global variable for the device ID reused. |
My running version is now: Version: 4.2.28-20241129140358 |
Thanks for pointing that out, that was a stupid mistake, but disabling the line leads to the other problems you encountered then. The correct line has to be: |
As for the bulk of MQTT messages, this is probably related to a bug in mosquitto that I mentioned somewhere else. Can you try and install mosquitto version 2.0.15 which does not seem to have that issue? As for the crash due to MQTT messages with wrong syntax, I will have to check, my mosquitto does not retain messages, so if you could find out more about that and open a separate issue, that would be great. |
Actually, my mosquitto does retain messages, just not across restarts of mosquitto (I guess I have to enable |
I did not restrat mosquitto at all as it runs as system service provided by Ubuntu 24.04. I will check how to downgrade it to maybe fix the "too many retained messages" problem, if that's causing the reconnect issue on many retained messages. Interestingly the problem also occurred after I deleted all retained messages, so basically the Mosquitto had an empty With restart I mean restart of BSB-LAN. The bug is actually the following:
Working on it. Will report later. |
That could be that it has to do with the old topic structure (which was only active for a week or so). Of course it would be good to catch the error, but if it no longer exists in the current version, then I guess we could live with that. The problem with messages getting dropped is unrelated to whether the broker already has stored many messages or not, the problem is described here: |
I think the best is to tell users to execute the following command on update to get rid of all retained messages:
(with their own configured topic root and user/pass if required). I think the issue with the crash happened when the 4.1 version I had installed before created topics without It actually reports the issue, but it looks like the code then goes into a loop or calls itsself. I think you may be able to reproduce it by publishing some retained messages manually with the topic structure of 3.x and a few with 4.1 style (without Uwe |
Can do, but for now, I'd rather focus on whether BSB-LAN works again without any problems with BSB (after the bugfix)... |
Hi, In the telnet console it still gets telegrams about Brennerstatus, but still shows:
Should I connect with serial console (luckily I added a USB cable coming out of the boiler on bottom....)? |
I think the issue has to do with the change from device id to serial number. In the message about detected decice it shows 128 as device ID. Not sure if this is correct. It looks like later it does not find the device in the loop, which goes over all discovered devices. |
Yes, a serial log would be helpful here. This probably happens in |
This could be the reason why it crashed when some invalid MQTT topics of old style were reported by MQTT broker. Thanks! I can't replicate it easily, but this could be th reason for the strange behaviour after it got old retained messages. |
Yes, pretty sure that's the same thing. Everything that was posted to |
Regarding the mosquitto bug: I applied the workaround in eclipse-mosquitto/mosquitto#2887 (comment) (downgrading on Ubunto 24.04 is not possible, I have now the 2.0.20 installed through the mosquitto PPA) and see what happens. |
Hi, the changes to Mosquitto config did not bring any changes. Basically the behaviour is very similar to that report recently on the FHEM forum: https://forum.fhem.de/index.php?msg=1326158 Like for this user, it often happens that on bootup the device connects multiple times to the MQTT server making it online/offline according to the I will prepare my Chromebook and connect it to the USB port to get a long-term log with all debug options enabled. My first guess is: Due to high activity on the heater and also the BSB-BUS somehow some messages seem to decode wrongly and the device enters a state where it "hangs". This causes the connections to drop shortly. It may be caused by the heater. But it does not seem to be a "too much heat" problem (ESP32 device does not get too warm or gets electric interference, because I added some shielding around my installation next to the LMU74 long time ago to prevent excessive heat from causing Display Unit flickering - a known problem on Brötje). In the meantime, I will disable the "availability" topic in my config, so short millisecond downtimes don't flood the log. Many thanks for the help here! If I find something why the device disconnects on high activity I will open new issue. |
Hm, I don't really know under what conditions an MQTT connection is considered closed because in the code, we only actively disconnect from the broker if the board is reset or the configuration is changed. If the device does not reboot, e.g. through a crash, then I don't know when the broker considers the connection to be disconnected. If the broker considers a connection disconnected if it is not regularly "served" (probably through Anyway, yes, let me know if you find something, but in a different issue, as you suggested... |
Hi yes, working on it. What you describe with "maintenance tasks" looks like the real issue. As I also said, the MQTT connection breaks first, but then also the telnet connection disconnects (a bit later). And if you look at webserver it also sometimes does not repond. In addition what I observed: When we had the bug (as described here where the bus address was detected in wrong way), the MQTT connection also closed all the time (it reported "connection lost" after about 15 secs, which is the keepalive time in the MQTT client used). The problem was that the device tried to send messages on the BSB bus using the wrong source device id (128) and those requests timeouted as the client got no reply packet. As the timeouts sum up when multiple messages are to be sent on BSB bus, it prevents progress on the MQTT (keepalives) or telnet connection. To me it looks that especially on startup and when the heater is active, some BSB messages produce timeouts and therefore stall the main loop. The problem of those ESP32/Arduino devices is that they don't support multiple threads. It is like Windows 3.1, you need to keep all tasks alive by switching between them on your own. Basically all actions waiting for network or bus would need to be done with some select()-like approach (send command and then do something else until the bus reports "got something new"). Which would be a major refactoring of BSB-LAN and not easily possible as the timing on the bus needs to be completely handled by the main CPU as it has no signal processor that can listen on BSB and create events. I will keep you informed. Thanks anyways for the great project. |
Maybe it would help to call the MQTT's loop() function when you are waiting for the BSB bus. |
Hm, each bus message has a 3s timeout, and there are three retries, so it shouldn't take 15 seconds from one MQTT loop to another. The ESP32 has two cores, so generally, it would be possible to shift that to a different core, but that would only work on the ESP32, not on the Arduino Due. I'll see if I can call the function more often somewhere... |
I have now added some more MQTT "housekeeping". If you want to run your tests, please try the most recent version from GitHub. |
Sure. Compiling.... |
Feedback: With your recent change the BSB Adaptor only receives the first value of an MQTT poll request:
So this new code seems to have side effects. Better revert it. After restoring previous commit all fine again:
|
If you send me logs, then please serial monitor logs. Anything else is of little help.
And first remove the |
I just had a look at the
Maybe this brings an improvement. If not, then I don't know if there's anything else I can do... |
Hi, you reverted it already. If I remove the mqtt_connect() call it works better. BUT: I also looked at the code: Calling loop() inside the BSB-LAN code is very risky, because the loop() will invoke the callback when it receives a message. As not all variables used there are local and also the status of the BSB implementation is not known, it may suddenly call another query to BSB while a new message arrives. One example where it breaks: I have seen, two polls behind each other will give partial results. So basically, calling of loop should be as often as possible, but due to the way the code is written (many global variables), this causes havoc. So we should not do this! I will give the socket timeout and keep alives a few updates to try out. P.S.: The serial log does not give much information except "connection lost". |
I now added: MQTTPubSubClient->setKeepAlive(15);
MQTTPubSubClient->setSocketTimeout(180); Setting keep alive to 30 is a bad idea, because it would make it more brittle. The keep alive tells the client to send a packet to the broker to tell it "yes I am there". If the broker does not get keep alive packets often enough, it will disconnect. Actually the socket timeout identical to keep alive looks strange to me. I raised it to 180 seconds. This would mean that if the server does not send a keep alive to the client in time, the client will kill the connection. The default keep alive value in Mosquitto is 60 secs, minimum configurable is 5s: MAN page:
Basically the socket timeout needs |
I give it a few hours while adding heater pain and report back. If all works well I can provide a PR. |
It also looks like one of the many keepalive bugs in the (now unmaintained) PubSubClient:
As you have the code of PubSubClient 2.8 included, I will patch it there. PR following |
Here is the PR: #682 To me this looks fine after making it Xmas-warm in my office (turn heater to 30 °C). This also seems to fix the duplicate connects on startup of BSB-LAN. Let's move over the discussion to the PR. Now this is really closed! |
|
What do you mean by that? Where else should it be called if not as part of the BSB-LAN code? And under what conditions would it be called
What status is not known? When/how may it suddenly call another query to BSB? Please don't make assumptions without actually substantiating them. Like above, where you claim that the destination ID might be only above 0x80 if it's a broadcast message, these assumptions don't help. In the case above, it was immediately clear to me that there is no base for this claim, but with other things, I may have to examine it, and without any substantiation, this becomes difficult.
Great. But without any logs, what should I do now? The logging function does much more than two polls right after each other, and there is no issue with that. So how do I replicate this situation? |
That's fine. I already figured out that the problem seems to not come from your code. For an issue that happens once per hour it is very hard to record a lot of serial logs, so I did my programmer's preference and reading the code. I am working a lot in software security checking and revieweing PRs of large projects like Elasticsearch / Apache Lucene/Solr, so I am one of those few people that can follow the program workflow without actually ever executing it. This is not meant bad, but I am not a fan of logfile debugging, because it does not help much. If you prefer that, I can provide log files, yes, but that's not my highest priority. |
Great that you work that way. If you fix all that by yourself, then that's perfectly fine. But if want action on my part, however, it would be polite to follow my requirements, to say the least. You don't know how many (real or not) self-proclaimed experts I've had over the years who thought they understood the code when in fact they didn't. I don't blame anyone for not "getting" the code as it has grown organically in many directions over the years, but that means if someone wants action on my part, they have to play by my rules. |
See below!
Sorry for that, I was just guessing here and made a comment. Sometimes another eye on code helps. The wire protocoll is not documented, so based on the "magic value" 128 and the fact that the broadcast address is 0x7F, my assumption was that the maximum device ID is 126; 127 is the broadcast. So the 8th bit could be some special marker. The issue with the "minus 0x80" I still disagree, but I let you do whatever you want. If you want to prevent issues when the BSB returns a device ID
I tried to explain that. If you call the pubsub
The problem is simply said: If you call the This is why I said "revert, revert" (this is a common phrase used in the ASF when referring to me....)
I don't think you need to do anything at moment. The issue seems to be in the PubSubClient and a logic error with regards to when to send or wait for PING requests. So the code disconnects on its own once it gets into a bad state. |
As you reverted the call to pubsub's Based on the comments in the pubsubclient issue tracker (see issues above), many people already call the |
Sorry, but you knew the code in question was reverted and then you made the unspecific statement that a) loop() shouldn't be called inside the BSB-LAN code (which would be all the code?) and b) you say that you have seen (generally?) "two polls behind each other will give partial results". Both of these statements are at least misleading. I take bug reports very seriously and spend time to understand and follow-up on them, so I ask everyone to be as clear and specific about it as possible. As for the race condition, I didn't add it because of the loop() issue but because there may be a race condition regarding the destination device ID. I don't know whether the callbacks are execuded in parallel or in serial. If they are executed in serial, then they the added code wouldn't be necessary. If they are added in parallel, the destinaton device ID couly be changed (temporarily) from one call and while this is executed and the second call comes in, the second call would take the temporarily changed destination device ID as the general destination device ID and then later revert back to that one upon completion. That's why we need to make sure that there is serial execution. |
It seems that the callback is only called as part of the |
Sorry for that, you were faster with reverting than I was with writing. I have seen the problem with the code and wrote why the call to Later I wanted to describe exactly why the first approach with the call to the loop wasn't working. Because I took it seriously and tested your code!
Thats fine! Appreciate that you try to find the problem. When you look at the pubsubclient code, the callback is directly called in the |
In the current code all is fine, it only breaks when you call |
That statement is still wrong, it's still called within the BSB code, just no longer recursively as part of the query() function. But let's leave it at that. |
Sorry for being imprecise: I meant inside the "BSB protocol handling code" or specifically in the query function. P.S.: I know what the issue is of being the only maintainer of a project. I do https://github.com/policeman-tools/forbidden-apis which is used in many many open source projects for code quality checks. Sometimes you can't handle stupid issues/pull requests. I went over to just ignore them and keep my release schedule. This project is also more or less a "finished" tool with no new features, it just needs to be updated once every half year.... |
Hi,
I did not yet install the new version of BSB-LAN but reviewed the changes since the Git version of end October that's still installed locally.
The new Topic structure with the endpoints
/status
,/set
,/inf
,.. looks fine and makes it much easier to set values from Home Assistant (HASS), as you no longer need templates to form the message to set a value, it should work automagically./poll
is also great because I can setup a cronjob in Homeassistant to poll specific values with different intervals (normaly things like heater status and temperature are fetched 40 seconds for me, but stuff like actual heating curve due to adaption do not need to be fetched that often). So I can change this to send/poll
MQTT messages with different intervals. Thats wonderful!In addition communicating with the BSB-LAN only through the MQTT serializes the messages "in order". Sending messages in parallel through HTTP sometimes leads to crushes (TCP reset) as the BSB-LAN is single-threaded. When everything from HASS goes through MQTT all arrives at BSB-LAN in order in a single "thread".
But one thing annoys me: The new topic structure contains the category id, which is somehow redundant. The device ID is easy to handle with search/replace in the config of HASS. But category id is totally useless! It may be usefull for autodiscovery, but if you don't use auto-discovery and configure your entities by hand (I have about 20 including a climate control entity and water heater entity), it gets annoying to lookup all category ids which are different for different devices. This also makes it hard to post reuseable configurations, because the categories are different in each system, while the parameter numbers are reuseable and the same for most heaters (for common parameters like temperature, FA status,....).
Would it be possible to either revert addition of category id to MQTT messages or provide a compile time config which removes them from the string formatting when creating MQTT topic paths?
Once there is an easy option to remove them, I will upgrade and report back how the new endpoints work (
/poll, /set, /status, /inf
).There's an additional bug which was observed during testing the master branch with discovery of BSB devices. See below for details!
The text was updated successfully, but these errors were encountered: