nagios安装与配置企业库|免费b2b网站

nagios安装与配置

原文地址：http://www.ritto.cn/2008/12/129.html

注意：这篇笔记里面的软件尽量安装所给的版本。

1.安装Nagios

useradd nagios
mkdir /usr/local/nagios
chown nagios.nagios /usr/local/nagios

tar zxvf nagios-2.9.tar.gz
cd nagios-2.9
./configure --prefix=/usr/local/nagios --with-cgiurl=CGIURL --with-htmurl=HTMURL --with-nagios-user=nagios --with-nagios-group=nagios
CGIURL 和HTMURL替换成自己的。
如果没有错的话，继续下面的步骤。
//配置nagios
make all                     //编译nagios
make install                 //安装主要的程序，CGI及HTML文件
make install-init            //在/etc/rc.d/init.d安装启动脚本
make install-commandmode     //给外部命令访问nagios配置文件的权限
make install-config          //将配置文件的例子复制到nagios的安装目录

验证程序是否被正确安装
看是否存在 etc,bin,sbin,share,var五个目录
bin   Nagios执行程序所在目录，nagios文件即为主程序
etc   Nagios配置文件位置
sbin Nagios cgi文件所在目录，执行外部命令所需文件所在的目录
share Nagios网页文件所在的目录
var   Nagios日志文件，spid等文件所在的目录

2.安装插件
tar zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/usr/local/nagios/
make
make install
ls /usr/local/nagios/libexec/     //会显示安装的插件文件

3.修改apache配置
Apache 的目的是可以在WEB上更加人性化的查看监控结果，其他的WEB服务器也可以。
grep ^User /usr/local/apache2/conf/httpd.conf
User daemon
usermod -G nagios daemon
此处替换成自己的。比如我的机器上是apache
usermod -G nagios apache 把apache和nagios 都放到nagios 组中。
vi /usr/local/apache2/conf/httpd.conf   //在下面加入如下内容
#Setting for nagios
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>

Alias /nagios /usr/local/nagios/share
<Directory "/usr/local/nagios/share">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>

/usr/local/apache2/bin/apachectl -t      //检查配置文件是否正确
httpd -t
/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd ritto
cat /usr/local/nagios/etc/htpasswd       //查看认证文件内容
/usr/local/apache2/bin/apachectl start   //启动apache
service httpd restart

4.配置Nagios

Nagios自定义了一套规则用于配置文件，在nagios里面定义了一些基本对象，如：

联系人       contact         出了问题像谁报告?一般当然是系统管理员了
监控时间段   timeperiod      7X24小时不间断还是周一至周五,或是自定义的其他时间段
被监控主机   host            所需要监控的服务器,当然可以是监控机自己
监控命令     command nagios 发出的哪个指令来执行某个监控,这也是自己定义的
被监控的服务 service         例如主机是否存活,80是否开,磁盘使用情况或自定义的服务等

另：多个被监控主机可以定义为一个主机组
多个联系人可以被定义为一个联系人组
多个服务也可以被定义为一个服务组

将配置文件改名:
cd /usr/local/nagios/etc/
tar zcvf bak.tar.gz *
mv cgi.cfg-sample cgi.cfg
mv commands.cfg-sample commands.cfg
mv localhost.cfg-sample localhost.cfg
mv nagios.cfg-sample nagios.cfg
mv resource.cfg-sample resource.cfg

修改Nagios的配置文件:
------------------------------------------------------------------------
vi nagios.cfg                                     //修改nagios的主配置文件

#cfg_file=/usr/local/nagios//etc/localhost.cfg    //注释此行 37
//将下面几行的注释去掉
cfg_file=/usr/local/nagios/etc/contactgroups.cfg //联系组配置文件路径
cfg_file=/usr/local/nagios/etc/contacts.cfg       //联系人配置文件路径
cfg_file=/usr/local/nagios/etc/hostgroups.cfg     //主机组配置文件路径
cfg_file=/usr/local/nagios/etc/hosts.cfg          //主机配置文件路径
cfg_file=/usr/local/nagios/etc/services.cfg       //服务配置文件路径
cfg_file=/usr/local/nagios/etc/timeperiods.cfg    //监视时段配置文件路径

check_external_commands=0     //将 0 改成 1,允许在web界面下执行重启Nagios
command_check_interval=15s    //改成 10s, 命令检查时间间隔
------------------------------------------------------------------------
配置相关CGI脚本。
vi cgi.cfg                         //修改cgi脚本控制文件

use_authentication=1               //确保值为 1
default_user_name=ritto            //修改为认证用户
//后面修改内容如下:
authorized_for_system_information=nagiosadmin,test
authorized_for_configuration_information=nagiosadmin,test
authorized_for_system_commands=test         //多个用户之间用逗号隔开
authorized_for_all_services=nagiosadmin,test
authorized_for_all_hosts=nagiosadmin,test
authorized_for_all_service_commands=nagiosadmin,test
authorized_for_all_host_commands=nagiosadmin,test
-------------------------------------------------------------------------
关于所有命令的配置都在 commands.cfg里面。
vi misccommands.cfg                //主要功能是用于发送报警短信和报警邮件

#host-notify-by-sms         //发送短信报警
define command {
command_name      host-notify-by-sms
command_line      /usr/local/bin/sms_send "Host $HOSTSTATE$ alert for $HOSTNAME$! on '$DATETIME$' " $CONTACTPAGER$
}

#service notify by sms      //发送短信报警
define command {
command_name     service-notify-by-sms
command_line     /usr/local/bin/sms_send "'$HOSTADDRESS$' $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTPAGER$
}
-----------------------------------------------------------------------------
vi timeperiods.cfg           //定义监控时间段,名称是24*7,监控时间是全天24小时

define timeperiod{
timeperiod_name         24x7     //时间段的名称,这个地方不要有空格
alias                   24 Hours A Day,7Days A Week
sunday                  00:00-24:00
monday                  00:00-24:00
tuesday                 00:00-24:00
wednesday               00:00-24:00
thursday                00:00-24:00
friday                  00:00-24:00
saturday                00:00-24:00
}
-----------------------------------------------------------------------------
vi contacts.cfg     //定义联系人

define contact {
contact_name         sa            //不要有空格
alias                system administrator
service_notification_period    24x7
host_notification_period       24x7
service_notification_options   w,u,c,r
host_notification_options       d,u,r
service_notification_commands service-notify-by-sms,service-notify-by-email //命令读配置miscommands.cfg
host_notification_commands     host-notify-by-email,host-notify-by-sms      //命令读配置miscommands.cfg
email                          monitor@wswtek.com
pager                          13297949944
}

define contact {
contact_name         ritto
alias                system administrator
service_notification_period    24x7
host_notification_period       24x7
service_notification_options   w,u,c,r
host_notification_options       d,u,r
service_notification_commands service-notify-by-sms,service-notify-by-email
host_notification_commands     host-notify-by-email,host-notify-by-sms
email                          ritto.zhao@wswtek.com
pager                          13297949944
}

//如果不需要手机报警的话，则改成如下:
service_notification_commands   notify-by-email
host_notification_commands      host-notify-by-email

//上面的文件定义了2个联系人，如果有更多联系人的话，照这个格式在后面追加即可。
// 服务通知选项（service_notification_options）
//与主机通知选项（host_notification_options）的几个选项在这里说明一下：
//w-warning 报警 , u-unknown 未知, c-critical 严重 , r-recovery 从异常情况恢复正常 ; d-down 关机了,
//u- unreachable,注意一下，主机报警和服务报警有些差异。

-----------------------------------------------------------------------------
vi contactgroups.cfg       //将多个联系人定义一个联系人组

define contactgroup{
contactgroup_name       sagroup
alias                   System Administrators
members                 ritto,sa
}
-----------------------------------------------------------------------------
vi hosts.cfg              //定义被监控主机

#define monitor host

############################################
# Wangjing IDC servers                     #
############################################
define host {
host_name                  nagios-server
alias                      nagios server
address                    192.168.4.226
contact_groups             sagroup              //多个联系组用逗号分隔，数据来源于contactgroups.cfg
check_command              check-host-alive     //这个命令来自commands.cfg,用来监控主机是否存活
max_check_attempts         5                    //检查失败后重试的次数
notification_interval      10                   //提醒的时间，每隔10秒提醒一次
notification_period        24x7                 //提醒的周期，24*7，来自之前timeperiods.cfg中定义的
notification_options       d,u,r                //指定什么情况下提醒，来自contacts.cfg中定义的
}

define host {
host_name                  mail12.supertalent.com
alias                      nagios test client
address                    192.168.4.41
contact_groups             sagroup
check_command              check-host-alive
max_check_attempts         5
notification_interval      10
notification_period        24x7
notification_options       d,u,r
}

------------------------------------------------------------------------------
vi hostgroups.cfg          //将多个主机定义一个主机组

define hostgroup{
hostgroup_name          sa-servers   //主机组名称
alias                   sa Servers   //别名
members                 nagios-server //组的成员主机,多个主机以逗号相隔,必须是上面hosts.cfg中定义的
}
------------------------------------------------------------------------------
vi services.cfg           //定义监控的服务

#service definition
###############################################
# Wangjing IDC servers service for host-live #
###############################################
define service{
host_name               nagios-server        //要监控的主机, 必须是hosts.cfg 中定义的
service_description     check-host-alive     //定义的是监控这个主机是不是存活
//给监控项目起个名字，任意起都可以，如check ftp
check_command           check-host-alive     //所用的命令,是commands.cfg中定义的
//所用的命令，必须是commands.cfg中定义的
check_period            24x7        //监控的时间段,是timeperiods.cfg中定义的
max_check_attempts      5
normal_check_interval   3
retry_check_interval    2
contact_groups          sagroup     //联系人组, contactgroups.cfg中定义的
notification_interval   10
notification_period     24x7        //通知的时间段, ,是timeperiods.cfg中定义的
notification_options    w,u,c,r     //在监控的结果是wucr时通知联系人,具体含义看前文.
}
define service {
host_name               mail12.supertalent.com
service_description     check_tcp 80
check_period            24x7
max_check_attempts      4
normal_check_interval   3
retry_check_interval    2
contact_groups          sagroup
notification_interval   10
notification_period     24x7
notification_options    w,u,c,r
check_command           check_tcp!80      //检查tcp 80端口服务是否正常
}

//书写时要注意的是，check_tcp与要监控的服务端口之间要用”!”做分隔符。如果服务太多，以考虑用脚本来生成。

-----------------------------------------------------------------------------

确保这一部的warning 和 error 为0，则继续下一步。
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg   //检查所有配置文件的正确性

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg   //作为守护进程后台启动Nagios

echo "/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg" >> /etc/rc.local    //开机自动运行

-----------------------------------------------------------------------------

使用命令和插件监控更多信息
/usr/local/nagios/libexec        //插件默认的安装路径

./check_disk -w 10% -c 5% /      //检查根分区的使用情况，若剩余10%以下，为警告状态(warning)
若剩余 5%以下，为严重状态(critical)

先监控一台主机
在hosts.cfg中定义主机名 --- 在services.cfg中定义监控内容

------------------------------------------------------------------------------

使用NRPE监控Linux上的"本地信息"
对系统为Linux的主机进行如下监控: CPU负载，磁盘容量，登陆用户数，总进程数，僵尸进程数，swap分区使用情况

在被监控主机上.
useradd nagios
passwd nagios

tar zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/usr/local/nagios
make
make install
chown nagios.nagios /usr/local/nagios
chown -R nagios.nagios /usr/local/nagios/libexec/
至于NRPE的作用，下面的博客讲的很详细了。
http://yahoon.blog.51cto.com/13184/41893
tar zxvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure                      //NRPE port: 5666
make all
make install-plugin
之前说过监控机需要安装check_nrpe这个插件,被监控机并不需要,我们在这里安装它是为了测试的目的
make install-daemon              //安装daemon
make install-daemon-config       //安装配置文件

ls /usr/local/nagios/
bin/     etc/     libexec/ share/     //现在nagios目录会有4个目录了

如果机器上没有xinetd，就yum 安装一个。
将NRPE daemon作为xinetd下的一个服务运行
yum -y install xinetd
service xinetd start
chkconfig --level 3 xinetd on

make install-xinetd          //安装xinetd脚本
编辑这个脚本
vi /etc/xinetd.d/nrpe
only_from = 127.0.0.1 192.168.4.226   //在后面增加监控主机的地址,以空格间隔

增加nrpe服务
vi /etc/services
nrpe            5666/tcp                        # nrpe    //增加这一行

重启xinetd服务
service xinetd restart

netstat -at | grep nrpe     //查看NRPE是否已经启动
netstat -an | grep 5666     //查看5666端口是否被监听
如果以上都没有任何结果，重复检查一下上面添加的东西是否正确！

vi /etc/sysconfig/iptables   //增加一条5666的端口
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT

在被监控机器上测试一下NRPE是否正常工作。
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1     //测试NRPE是否正常工作
NRPE v2.8.1           //正常的结果会显示当前NRPE的版本号

注意:-c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon只运行nrpe.cfg中所定义的命令

查看NRPE的监控命令
cd /usr/local/nagios/etc
vi nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

[***]中是命令名，也就是check_nrpe的-c参数可以接的内容，=后面是实际执行的插件程序。

比如，我现在要监控硬盘的情况,加入以下内容：
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/sda1
command[check_hda3]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/sda3

所以以上nrpe.cfg配置的命令用法如下：
[root@ytt etc]# check_nrpe -H127.0.0.1 -c check_hda1
DISK OK - free space: /boot 77 MB (82% inode=99%);| /boot=16MB;78;88;0;98
[root@ytt etc]# check_nrpe -H127.0.0.1 -c check_hda3
DISK OK - free space: / 134960 MB (97% inode=99%);| /=3322MB;145789;145799;0;145809

/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_users        //检测登陆用户数
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_load         //CPU负载
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_zombie_procs //僵尸进程
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_total_procs //总进程数

//check_load -w 15,10,5 -c 30,25,20
//在unix里面负载的均值通常表示是1分钟,5分钟,15分钟内平均有多少进程处于等待状态
// 当1分钟多于15个进程等待,5分钟多于10个,15分钟多于5个则为warning状态
//当1分钟多于30个进程等待,5分钟多于25 个,15分钟多于20个则为critical状态

-------------------------------------------------------------------------------------

在运行 Nagios的监控主机上
因为之前已经将Nagios运行起来了，现在要做的是:
安装check_nrpe 插件
在commands.cfg中创建check_nrpe中的命令，只有在commands.cfg中定义过的命令才能在 services.cfg中使用
创建对被监控主机的监控项目

tar zxvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure
make all
make install-plugin    //只运行这一步即可，只需要check_nrpe插件

这一步，记得在被监控机器上的
/etc/xinetd.d/nrpe里面加上监控机器的IP地址！
/usr/local/nagios/libexec/check_nrpe -H 192.168.4.30
NRPE v2.8.1        //测试一下监控机使用check_nrpe与被监控机运行的nrpedaemon之間的通信
//看到已经返回了正确的NRPE的版本信息，说明一切正常

在 commands.cfg中增加对check_nrpe的定义
vi /usr/local/nagios/etc/commands.cfg
#################################################################
# 2008.12.4 by ritto
#################################################################
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

//command_name check_nrpe 定义命令名称为check_nrpe, 在services.cfg中要使用这个名称
//command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ 定义实际运行的插件程序

接下来可以在services.cfg中定义对主机CPU负载的监控。
注意机器必须是LINUX系统，并且安装了 NRPE.
vi services.cfg
define service {
host_name               mail1
service_description     check-load
check_period            24x7
max_check_attempts      4
normal_check_interval   3
retry_check_interval    2
contact_groups          sagroup
notification_interval   10
notification_period     24x7
notification_options    w,u,c,r
check_command           check_nrpe!check_load
}

----------------------------------------------------------------------

在被监控机上增加check_swap命令的定义
vi /usr/local/nagios/etc/nrpe.cfg
增加下面这一行
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

//如果在被监控机上是以daemon运行的nrpe,则需要手动重启
//如果在被监控机上是以xinetd运行的，则不需要

----------------------------------------------------------------------

在监控机上增加这个监控项目:
vi /etc/services.cfg
define service {
host_name               mail1
service_description     check-swap
check_period            24x7
max_check_attempts      4
normal_check_interval   3
retry_check_interval    2
contact_groups          sagroup
notification_interval   10
notification_period     24x7
notification_options    w,u,c,r
check_command           check_nrpe!check_swap
}

------------------------------------------------------------------------------

建议再检查一下配置是否正确再启动服务！
所有配置都已经修改好，现在重启Nagios,钉掉Nagios进程，再重启，过一会就可以看到画面了
killall nagios
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
或使用
/etc/init.d/nagios restart/start/stop/status
如查报错，则可能是脚本路径设置错误，
vi /etc/init.d/nagios
将prefix=/usr/local/nagios改为安装的目录 /etc/init.d/nagios

参考学习地址：
http://sery.blog.51cto.com/10037/20520
http://yahoon.blog.51cto.com/13184/41778

郑重声明：资讯【nagios安装与配置】由发布，版权归原作者及其所在单位，其原创性以及文中陈述文字和内容未经(企业库qiyeku.com)证实，请读者仅作参考，并请自行核实相关内容。若本文有侵犯到您的版权，请你提供相关证明及申请并与我们联系（qiyeku # qq.com）或【在线投诉】，我们审核后将会尽快处理。

—— 相关资讯 ——