Thursday 1 December 2011

Importance of monitoring what you have outsourced

Your outsourced system is never as important for your supplier as it is for you.  Most contracts has check times counted in minutes, and by the time the set amount of alarm has been triggered, to avoid false positives and an operator has been alerted 15 minutes can easily have gone.  And 30 minutes or more before anybody takes it in hand.  Since you squeezed the price you pay for the service down to the absolute minimum the agreed penalty is seldom in relation to what the outage means financially to your organisation.
Another reason for doing your own monitoring is that it will give you the unmasked truth.  Do you trust your supplier to always tell you what’s going on.  Is their answers at times vague or slow forth coming.

The easiest way to see traffic is by network monitoring. A simple network graph from a tool like Utilwatch will give you second by second information, and can run on the cheapest oldest pc you have.  If it’s running in the background but within your field of vision you will immediately know if something is amiss.  Experience lets you interpret the data better.  You can also via simple scripts create easy traffic-lights.
Cheap second by second tools do however seldom store the data. They are wysiwyg.On screen current display only. You seldom need to store this much data though.  The interpretation is dependent of other factors at the time.  Like did you start/stop something.  Where your web caches reloading,  Was the blip due to a scheduled maintenance.  A simple screenshot will capture the moment for later inclusion in a manual log together with comments.  

There is also many tools that let you set up triggers and alarms to your own liking.  I would pick at least one that isn’t from the supplier of what you try to monitor.  If the supplier know how to monitor it / trigger the alarm they would/should have fixed the problem in the first place. 
Some like ipmonitor is also cross platform, and store the history of previous alarms if configured correctly.  If your urgency is lower in priority, and/or your problem is outside normal hours tools like Cacti will give you a view of last nights/weeks/months proceedings.

If you don’t feel like spending time or effort on monitoring yourself but still see the value of an outside eye on your hosting/network/resource provider there is many third party suppliers that will happily let you try before you buy their monitoring services.  But then you are back to the 15-30  minutes instead of seconds response again.

