4

I'm using MQTT broker for my IOT devicess, mostly ESP8266 ( some are NodeMCU, some Wemos mini, and some are Sonoff) , running an Arduino code, some time ( about a year ). MQTT broker is running on a RasppberryPi Zero W, flawlessly.

All devices using my own created library, including wifi connectivty and MQTT with fucntions designed for my IOT's.

Some devices (2 or 3 of out 20) - keep getting to offline state using keepAlive feature in MQTT ( PubSub library to be correct ), after 30 seconds defined. The second Availabiliy state goes to offline it updates back to online.

Now I'm trying to figure out the reason for it for some time, and this is the reason I decided to share:

  1. As said, code is mostly generic, and my wifi+mqtt is the same for all.
  2. If it was something to do with bad wifi signal causing MQTT lose connection to server, it should have initiate a reset command using a watch dog timer for such cases ( 20 secs to reset ).
  3. I tried to change MQTT broker for that specific IOT to be sure it has nothing to da with the broker... and problem still persists.

My questions are :

  1. if #define MQTT_KEEPALIVE 30 as defined in PubSub.h - check only once every 30 sec or there is more than 1 check in that time interval ??
  2. Is there another way to check why or what might be the reason for an iot device losing connection with broker, getting to keepAlive phase, and go right back after send that lastwill ?
hardillb
  • 12,813
  • 1
  • 21
  • 34
guyd
  • 899
  • 9
  • 17

3 Answers3

4

The keepalive value is sent to the broker as part of the connection request. The broker uses this value to start a timer with that value that starts counting down.

Every time the client sends a packet to the broker it resets the timer. This includes publishes, responses to high QOS messages sent to the client, requests to subscribe/unsubscribe to topics.

If the timer expires (at 30s) then the broker will send the client a PINGREQ packet, to which the client should reply with a PINGREP packet, and hence resets the timer.

If the timer runs out and the broker doesn't receive a PINGREP packet from the client the broker will wait until another 0.5 * the keepalive time (15 seconds from the PINGREQ in this case) has passed before publishing any LWT for the client and marking it as disconnected. None of the keepalive handling (apart from responding to a PINGREQ) is handled by the client.

So if the client crashes 5 seconds before the PINGREQ is sent, then the watchdog will reset the device just after the LWT is published by the broker and it will reconnect (and show as online) again nearly straight away.

hardillb
  • 12,813
  • 1
  • 21
  • 34
2

I had a similar issue, and while I expected the broker to behave as hardillb describes in their answer, I would still get timeouts. To work around it, I ended up sending an MQTT ping from the client at an interval that was slightly less than the timeout period. This solved the problem, but it seems inefficient to me.

I was not able to sniff the network traffic at the time, so I'm not sure if the broker was not sending the PINGREQ packets or if the client connection was not responding properly with a PINGREP.

John S
  • 481
  • 2
  • 7
1

The solution is in :

void loop use client.loop();

Had the same issue mosquitto disconect the wifi was still working , i saw in the mosquitto log the disconnect due to timeout exceeded and after the loop it keeped working.

The loop in the Arduino/esp32 is so busy that the keep alive wil not work from the mqtt lib.

hope that this helps some people.