We can do a few advanced tricks with Check_MK that do diagnosis and detection of typical OS issues, i.e. in the first example we check RHEL systems for hangs at netfs startup. I.e. when a _netdev filesystem has crashed into FSCK.

These are examples, the syntax may be wrong and the regex will be wrong.

checks += [

   (["rhel"],     "netFS script hang",   "ps", (".*netfs",            ANY_USER, 0, 0, 1, 1)),
   (["rhel"],     "startup script hang", "ps", ("/bin/sh.*rc.*start", ANY_USER, 0, 0, 1, 1)),
   (["unix"],     "FSCK running",        "ps", (".*fsck",             ANY_USER, 0, 0, 1, 1)),
   (["leetnuks"], "FS Mount hang",       "ps", (".*bin/mount?.*",     ANY_USER, 0, 0, 1, 1)),
   (["leetnuks"], "Reboot hang",       "ps", (".*reboot",     ANY_USER, 0, 0, 1, 1)),
   (["leetnuks"], "Shutdown hang",       "ps", (".*shutdown??-",     ANY_USER, 0, 0, 1, 1)),
   (["leetnuks"], "Zombies",             "ps", ("*defunct*",         root, 0, 0, 1, 1)),
]

add in-tie a number retries to allow for a normal delay of FS activation
2 minutes should be safe.

add a local check that verifies /var/lock/subsys/local has been created and one that checks your runlevel.

missing