Error Handling Fallbacks In IoT Software

Question

When one designs software for remote devices and IoT, one has to consider how the system manages various failures, be it software or hardware.

If the system recognizes a SW bug, it may notify the cloud and revert to a boot loader. If the system recognizes a HW peripheral issue, it may stop using it and notify the cloud. If the system happens into a fault where it must question its own sanity - let's say, when the NVM is unreliable, it may require a complete shutdown.

This is a very big and important issue, on which the rest of the SW should be built.

I believe this issue to be common enough for guidelines, tutorials and literature to be written about, so we don't have to reinvent it on our own in each of our individual projects.

I would like to know if there is recommended literature, tutorials or guidelines for designing remote device software for robustness, especially regarding image updates.

Edit: The focus here is not on error detection, but on how to design a sandbox, in which errors and faults can be treated safely in an IoT device environment.

score 3 · Accepted Answer · answered Apr 12 '19 at 10:30

I have a question about FOTA which got no reply. So I researched & posted my own answer, so that you don't have to reinvent the wheel.

You can either use RUAC which looks to be so good that it might be overkill, or you could work your way through (the most recommended) FOTA on GitHub.

If you don't choose one, there is enough FOSS there that you can read the documentation & code to get a feel for how others do it, and establish your own guidelines.

Please, if you find anything better, post it here, to help others. In fact, whatever you choose, please post it here. Thanks

Error Handling Fallbacks In IoT Software

1 Answers1