1. Create nagios user and group
2. Download Nagios plugin and nrpe daemon from http://www.guntram.de/nagios/ for appropriate solaris version
3. Unzip and untar the files under /usr/local/nagios
4. Create xml file to import nrpe as SMF service or take it from here.
http://unixjournal.org/tag/nrpe/
5. Import xml file into SMF
# svccfg import /var/svc/manifest/network/nrpe.xml
6. Keep the service disabled for time being
# svcs -a|grep nrp
disabled 17:09:11 svc:/network/nrpe:default
7. Create an nrpe file under /lib/svc/method
# more /lib/svc/method/nrpe
#!/sbin/sh
#
LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
PIDFILE=/var/run/nrpe.pid
NRPE_BIN=/usr/local/nagios/bin/nrpe
CONFIG_FILE=/usr/local/nagios/etc/nrpe.cfg
case $1 in
# SMF arguments (start and restart [really "refresh"])
'start')
$NRPE_BIN -d -c $CONFIG_FILE
;;
'restart')
if [ -f "$PIDFILE" ]; then
/usr/bin/kill -HUP `/usr/bin/cat $PIDFILE`
fi
;;
'stop')
if [ -f "$PIDFILE" ]; then
/usr/bin/kill `/usr/bin/cat $PIDFILE`
fi
;;
*)
echo "Usage: $0 { start | stop | restart }"
exit 1
;;
esac
exit $?
#
8. Start the nrpe service
# svcadm enable nrpe
# svcs -a|grep nrp
online 17:17:31 svc:/network/nrpe:default
9. Now modify /usr/local/nagios/etc/nrpe.cfg file to include what will be monitored on this server.
command[check_local_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_local_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_local_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_local_procs]=/usr/local/nagios/libexec/check_procs -w 350 -c 400
command[check_zfs_rpool]=/usr/local/nagios/libexec/check_zfs1 rpool 1
command[check_zfs_datapool]=/usr/local/nagios/libexec/check_zfs1 datapool 1
command[check_fmd]=/usr/local/nagios/libexec/check_fmd
command[check_meta]=/usr/local/nagios/libexec/check_meta
Remember to download check_fmd, check_meta and check_zfs binaries from
http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-Systems/Solaris
10. Now configuration change has to be made on Nagios server. In the /usr/local/nagios/etc/objects/solaris.cfg file add following
# Define a host for the hostname.domain.org
define host{
use solaris-server host_name hostname.domain.org
alias hostname
address 100.100.100.15
contact_groups Admins
}
###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################
# Define a service to "ping" the local machine
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description PING
check_command check_ping!100,20%!500,60%
}
# Define a service to check the disk space of the root partition
# on the Remote machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description Root Partition
check_command check_nrpe_1arg!check_local_disk
}
# Define a service to check the number of currently logged in
# users on the Remote machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description Current Users
check_command check_nrpe_1arg!check_local_users
}
# Define a service to check the number of currently running procs
# on the Remote machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description Total Processes
check_command check_nrpe_1arg!check_local_procs
}
# Define a service to check the load on the Remote machine.
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description Current Load
check_command check_nrpe_1arg!check_local_load
}
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description Solaris Fault Manager
check_command check_nrpe_1arg!check_fmd
}
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description Meta Device Status
check_command check_nrpe_1arg!check_meta
}
# Define a service to check SSH on the Remote machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use Remote-service ; Name of service template to use
host_name hostname.domain.org
service_description SSH
check_command check_ssh
}
11. Note the check_nrpe_1arg command used instead of default check_nrpe (since i am not using any arguments. we should have a command by name check_nrpe_1arg in commands.cfg like below.
# # 'check_nrpe_1arg' command definition
define command{
command_name check_nrpe_1arg
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
12. Restart nagios service on Nagios server and in few minutes solaris client will be monitored.
Problems encountered.
1. could not complete ssl handshake
Downloaded nrpe is compiled with ssl support. Make sure that on server command check_nrpe_1arg does not has -n with it as that is used for without ssl support.
2. no output returned from plugin
Try running command from server using command line. for example
#pwd
/usr/local/nagios/libexec
# ./check_nrpe -H hostname.domain.org -c check_meta
OK - No disk failures detected
If you get the above output it means you are using check_nrpe instaed of check_nrpe_1arg. Check_nrpe expects some arguments which we do not require.
No comments:
Post a Comment