When you are a system administrator one of the first needs you need to satisfy is having a monitoring system. This system needs to give notify you when something breaks almost in real-time with information about what it is happening. Fortunately for everyone, we have Nagios. Nagios supports many things, but I will talk in this post only about service and dependencies.

Thanks to Nagios I am able to manage more than 20 VoIP servers without big issues. If something happens, my smartphone starts sending me alerts until I manually stop it.

If you are new to Nagios I recommend you to read the Nagios Core Documentation. I will assume you are quite familiar with it and that you have already a working deployment.

Service & Host Definitions

The first thing you must have is your host (first) and service (second) definitions. I use the following:

define host{
use linux-server
host_name fusionpbx.server.hostname
alias fusionpbx
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description PING
check_command check_ping!300.0,20%!500.0,60%
notifications_enabled 0
check_interval 10
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description SSH
check_command check_ssh
notifications_enabled 0
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description HTTP
check_command check_http
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description SMTP
check_command check_nrpe2!check_smtp
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description SIP
check_command check_sip!sip:This email address is being protected from spambots. You need JavaScript enabled to view it.
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description MOS - last hour average
check_command check_mysql_query!'select if(count(*) = 0,4,avg(rtp_audio_in_mos)) as avg_mos from v_xml_cdr where rtp_audio_in_mos is not null and answer_stamp >= date_sub(NOW(), interval 1 hour)'!fusionpbx!database_server!database_user!database_password!4:6!3.4999:6
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description MariaDB
check_command check_mysql4!database_server!database_user!database_password
notifications_enabled 1
}

#define service{
# use local-service
# host_name fusionpbx.server.hostname
# service_description Bacula File
# check_command check_bacula2!ip!fd!nagios-mon!shared_scret
# notifications_enabled 1
# check_interval 360
# notification_interval 360
# }

define service{
use local-service
host_name fusionpbx.server.hostname
service_description RBL
check_command check_rbl
notifications_enabled 1
check_interval 120
notification_interval 120
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Current Load
check_command check_nrpe!check_load!15,10,5 30,25,20
notifications_enabled 1
}
define service{
use local-service
host_name fusionpbx.server.hostname
service_description Current Users
check_command check_nrpe!check_users!5 10
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Root partition
check_command check_nrpe!check_disk!20% 10% /
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Swap Usage
check_command check_nrpe!check_swap!20% 10%
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Total Processes
check_command check_nrpe!check_procs!50 60
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Freeswitch daemon
check_command check_nrpe!check_procs_by_name!1:1 1:10 freeswitch
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Crond daemon
check_command check_nrpe!check_procs_by_name!1:1 1:10 crond
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Memcached
check_command check_memcached
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description TCP/5666
check_command check_tcp!5666
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Freeswitch registered extensions
check_command check_nrpe3!check_fs_registered
notifications_enabled 1
}

define service{
use local-service
host_name fusionpbx.server.hostname
service_description Freeswitch event socket
check_command check_nrpe3!check_fs_event_socket
notifications_enabled 1
}

Service Dependencies

Service dependencies are important. If something fails, Nagios will start screaming like crazy. This capability, if it is well configured, will help you to figure out what it is failing. For example, if FreeSWITCH daemon is down, you will only have that alert instead of having alerts for port 5060/udp, event socket, extension registration and others.

define servicedependency{
host_name bruno.okay.com.mx
service_description MariaDB
dependent_host_name bruno.okay.com.mx
dependent_service_description MOS - last hour average
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description Freeswitch daemon
dependent_host_name bruno.okay.com.mx
dependent_service_description SIP
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Current Load
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description SMTP
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Current Users
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Root partition
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Swap Usage
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Total Processes
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Freeswitch daemon
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description TCP/5666
dependent_host_name bruno.okay.com.mx
dependent_service_description Crond daemon
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description Freeswitch daemon
dependent_host_name bruno.okay.com.mx
dependent_service_description Freeswitch event socket
execution_failure_criteria n
notification_failure_criteria w,u,c
}

define servicedependency{
host_name bruno.okay.com.mx
service_description Freeswitch event socket
dependent_host_name bruno.okay.com.mx
dependent_service_description Freeswitch registered extensions
execution_failure_criteria n
notification_failure_criteria w,u,c
}

NRPE

You will need the following NRPE service definitions. I usually put this in /etc/nrpe.d/okay.cfg. If you are not familiar with NRPE I suggest you read the Github NRPE Wiki.

command[check_users]=/usr/lib64/nagios/plugins/check_users -w $ARG1$ -c $ARG2$
command[check_load]=/usr/lib64/nagios/plugins/check_load -w $ARG1$ -c $ARG2$
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
command[check_procs]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$
command[check_procs_state]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
command[check_procs_by_name]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -C $ARG3$
command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w $ARG1$ -c $ARG2$
command[restart]=sudo /sbin/service $ARG1$ restart
command[check_smtp]=/usr/lib64/nagios/plugins/check_smtp -H 127.0.0.1
command[check_fs_registered]=/usr/lib64/nagios/plugins/check_fs_registered -e '*' -c 0 -w 1
command[check_fs_event_socket]=/usr/lib64/nagios/plugins/check_tcp -H 127.0.0.1 -p 8021

You are ready to go, happy VoiP Monitoring

blog comments powered by Disqus