Tuesday, July 21, 2020

How I saved my Raspberry IoT Edge host going into Denial-Of-Service (DoS) mode

Setup of IoT system

I was using a Raspberry-Pi #B (35$ ARM processor, RPi) for home autoamation running as edge router/server . I have a customized golang server that connects to my IoT sensors and updates to postgres DB (previously I used TICK stack , but I had Influxdb corruption issues, so I switched to postgres DB). All of these apps are hosted in docker-instances inside the RPi.  I still use Graphana as front-end GUI (after I added the postgres as datasource to Grapahana). 

Disk configuration to separate OS with applications

To safegaurd RPi going to Denial-Of-Service mode due to root disk-starvation, I have used extra USB-stick for customized-Apps, Postgress DB, Graphana and other docker instances. In this way, if any applications (including graphana, postgres, influx,  customized apps) fills-up disk , one can still access RPi as RPi's SD-Card  is still in good-shape (wrt to disk space).

Telegram-Bot integration

Recently I have added support for Telegram-Bot integration with my RPi , so that I can communicate to my RPi outside my home WiFI network. The integration went well and I can issue commands from Telegram app and handle in RPi (evetually I can reach my sensors). 


Telegram-Bot transient error cauing DoS situation

Everything went fine till I switched off my home WiFi network in home. The moment I switched off WiFi network, the third party Telegram-Bot framework started to fills up the USB-Disk (which is shared by docker instances, customized apps,etc) as my customized-app's stdout/stderr are routed to USB-disk (for debugging/post-mortem-analysis). Within 12 hours of night time, 8GB of data is filled up. If this fillup is continued, within another 12 hours all of my docker instances (postgres, graphana and my customized app will have DoS siatuation.

Solution

To mitigate the above DoS situation, I could route the stdout/stderr to /dev/null, but I will lose information in case of critical errors(apart from this error) from third-party-code-bases and any unhandled golang's stack-traces and panic outputs.

For handling all the above tricky situations, I have written small golang application named safeout that consumes stdout/stderr of any number of processes and redirects the output of each stdout/stderr into disk-files with checks on maximum size (with one backup file). 

I configured my customized application with above safeout . Now myApp (or any other app in docker instance ) starts redirecting  stdout/stderr to disk, they will not fill-up disk (as safeout will ensure maximum disk space limits are honored with one backup copy)

safeout Code is at Safeout