Every place I’ve ever worked had cronjobs running all over the place.Some are simple tasks like clearing out a temp directory.Others end up being a critical piece of the infrastructure that a developer wrote with out telling anyone about.I like to call this type of scheduled job the glue as its usually holding your company together.
True story I once found a cronjob running on a cluster of 200 servers named brett.sh that restarted an app every 30 seconds!!
In most cases the “glue” cronjob is unknown to anyone as to where the job runs, how often and most importanlty when it fails.There are a few tools out there to put all of your scheduled jobs in one spot and will take actions on failure.Some of those include opswise (http://www.opswise.com/) which I’ve used in the past and had a lot of success with and Amazon’s Simple Workflow Service (http://aws.amazon.com/swf/) which I haven’t used yet.
There is also an opensource project sponsered by yelp called tron which does most of this already except for notifying when it fails.BTW there is a feature request for this already, ( https://github.com/Yelp/Tron/issues/25 )
Anyway as a quick work around I just add a check for the exit code in my crontab which will alert me if the job doesn’t exit zero.
1 0 * * * touch /home/dodell/foobar|| if [ $? -ne 0 ] ; then mail -s 'touch_file failed' email@example.com < /etc/hostname ;exit 1