Defining the Right Config
The first step you can take to prevent a flood of pages is to define all you routers, switches, and other network equipment in your Nagios config. After you have that defined you simply need to define a parent on the config object.
For example:
# Primary Switch in VRRP Group
define host {
use switch
address 10.0.0.2
host_name switch-1
hostgroups switches
}#Secondary Switch in VRRP Group
define host {
use switch
address 10.0.0.3
host_name switch-2
hostgroups switches
}define host {
use server
address 10.0.0.100
host_name apache-server-1
hostgroups servers, www
parents switch-1, switch-2
}
This will configure the host apache-server-1 such that if switch-1 and switch-2 fail, alerts will be silence from the client. The alerts will remain off until either switch-1 or switch-2 becomes available again.
A Few Things to Keep in Mind
Nagios is pretty smart, and can handle multiple parents so that alerts will only be silenced if both parents become unavailable.
The availability of parent hosts is determined by the host health check, most commonly ping. If you need some other test of availability, make sure to define this in the host object.
Parent all the objects you can or that make sense to parent. For example, a router or transport failure at a remote data center should only send a single alert. This means you should define your routers, switches, and possibly your providers gateways. Do whatever you think makes sense, and take it as far as your can. Remember your goal is to make the number of alerts manageable, so the better you define the topology the less likely you are to get a useless page, or several hundred useless pages.