system pulse

Build your own Linux / Ubuntu System and Network Health Monitor Application – Part I

Posted on Updated on

At home I have various devices doing various things.. and it’s important to me that I know they are working. There are many tools out there designed to keep an eye on server health, but they don’t do everything that I want in the way that I want, and I’m a big believer that coding = creating, so there’s nothing wrong with reinventing the wheel just for the sake of it.

For me it started recently when I repurposed an old android phone as an IP webcam pointing out the front window. After that it was a natural progression to install the excelent linux app Motion to pick up movement from that feed, and store images and mpegs to disk – for security. Well no point in storing the images on the nice machine that is likely to be stolen in event of a break in.. so let’s store them on an old EEE-Box that’s running headless in a hidden corner of the basement 😉

Well there you have it. Definite needs for a frequent health check:

– is the camera working?
– is the EEE nfs mount mounted?

I wrote a ruby script and cron to solve that problem, but then it just grew. Now I have the system running on multiple machines:

– checking each other
– making sure all my websites are responding to ping
– validating that certain URLs are giving a 200 response
– making sure that disks aren’t filling up
– checking that certain processes / background applications are running

On top of that:

– my laptop knows if I’m at home or away and tests accordingly
– the system creates Desktop icons for each problem (and removes them when problems resolve)
– it can generate Ubuntu desktop notifications
– it can notify me of issues with a text message
– each machine manages hostname|issue style files on Dropbox so whereever I am my laptop can tell me what’s  going on at home.
– it shuts up at night
– for every problem it finds, it checks to see if there are instructions to try to correct it

All this is achieved with:

– 200 lines of core ruby methods that perform all the tests
– 50 lines of code relating to specific issue resolution
– 100 lines of control code –  one liners which are basically “If you are this machine, then do this test”

In Part II we’ll take the very basics of that code, and create a simple single ruby script to keep an eye on a machines own health.

Advertisements