Nagios监控linux服务器

590 阅读6分钟

Nagios是一款开源的免费网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设备,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。

Nagios主要监控的方面:

  1. 内存,cpu使用情况
  2. 硬盘使用情况
  3. ping,服务器是否宕机
  4. 监控网络服务(SMTP、POP3、HTTP、NNTP、PING等)
  5. 监控网络设备

Nagios工作原理
Nagios本身不包括监控主机和服务的功能。所有的监控、监测功能都是通过各种插件来完成的。安装完nagios之后,在nagios主目录下的/libexex里面放有nagios自带的插件,如:check_disk是检查磁盘空间的插件,check_load是检查cpu负载的插件,每一个插件可以通过运行./check_xxx -h命令来检查其使用方法和功能。

接下来开始在安装nagios并使其能够监听另一台服务器

所需环境:Centos7  
监控服务器(服务端):192.168.1.33
被监控服务器(客户端):192.168.1.37

一. 服务端Nagios的安装

  1. 先安装nagios所需的编译环境
yum install httpd php
yum install gcc glibc glibc-common unzip
yum install gd gd-devel
  1. 创建nagios用户与用户组
/usr/sbin/useradd -m nagios
passwd nagios
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd apache
  1. 下载nagios安装包及其插件,我所用的nagios版本为4.4.3,插件版本为2.3.3,snmp插件版本为1.1.1,链接地址如下:
    nagios: assets.nagios.com/downloads/n…
    nagios-plugins:nagios-plugins.org/download/na…
    nagios-snmp-plugins:nagios.manubulon.com/nagios-snmp…

  2. 在/usr/local目录下下载好之后,先对nagios进行解压

     tar -zxvf nagios-4.4.3.tar.gz
     cd nagios-4.4.3
    

    4.1 再编译配置信息

     ./configure --with-command-group=nagcmd
     make all
    

    4.2 安装 Nagios,并初始化脚本及基础配置文件

     make install
     make install-init
     make install-config
     make install-commandmode
    

    4.3 编译并安装nagios-plugins插件

     tar -zxvf nagios-plugins-2.0.3.tar.gz   ##解压nagios-plugins
     cd nagios-plugins-2.0.3
     ./configure --prefix=/usr/local/nagios
     make
     make install
    

    4.4 编译安装nagios-snmp-plugins

     tar -zxvf nagios-snmp-plugins.1.1.1.tgz 
     cd nagios-plugins-2.0.3/
     yum install perl-CPAN   ##yum安装Perl-CPAN,CPAN是Perl软件收藏库,收集了大量有用的Perl模块(modules)及其相关的文件。nagios-snmp-plugins是一套用Perl编写的通过SNMP方式监控主机的插件程序,因此需要先安装perl-CPAN
     perl -MCPAN -e shell
     cpan[1]> install Net::SNMP
    

    安装好之后先不用启动,后面需要继续配置。

  3. 配置文件在目录 /usr/local/nagios/etc 下,通过这些基础配置信息可以直接启动 Nagios,现在唯一需要更改的是联系人信息,可以编辑/usr/local/nagios/etc/ objects/contacts.cfg ,将联系人邮件地址换成你自己的。

     vim  /usr/local/nagios/etc/objects/contacts.cfg
    
  4. 配置httpd
    vim /etc/httpd/conf/httpd.conf
    1)将

     **User apache**   
     **Group apache**  
     修改为  
     **User nagios**   
     **Group nagios**  
    

    2)在如下位置添加index.php

     <IfModule dir_module>
         DirectoryIndex index.html,index.php
     </IfModule> 
    

    3)配置nagios.conf

     cd nagios/       #nagios 解压后的目录  
     make install-webconf   
     vim /etc/httpd/conf.d/nagios.conf
    

    将代码中的所有htpasswd.user改为htpasswd
    4) 创建nagios验证文件

     htpasswd -c /usr/local/nagios/etc/htpasswd admin 
     cat /usr/local/nagios/etc/htpasswd    ##查看认证文件的内容
    

    5).关闭SELinux

     a、临时关闭(不用重启机器):
     setenforce 0    ##设置SELinux 成为permissive模式  (关闭SELinux)
     setenforce 1    ##设置SELinux 成为enforcing模式    (开启SELinux)
     b、修改配置文件需要重启机器:
     vi /etc/selinux/config
     将SELINUX=enforcing 改为SELINUX=disabled
     需重启机器
    
  5. 启用服务并进入nagios

systemctl start httpd.service
systemctl start snmpd.service
systemctl start nagios.service

此时就可以通过192.168.1.33/nagios访问web界面了

注:启动后可能会出现web界面无法访问的情况,通过查看apache系统日志,即/var/log/httpd/error_message.log找出错误,我在这里简单说一下我的错误的解决办法,将http的配置文件即/etc/httpd/conf/httpd.conf配置文件的DocumentRoot路径修改为与/usr/local/nagios/etc/cgi.cfg力度physical_html_path路径相同。最后重新启动nagios服务。

二. 客户端Nagios的安装

NRPE简介
nagios监控远程主机的方法有多种,其方式包括SNMP、NRPE、SSH和NCSA等。这里介绍其通过NRPE监控远程Linux主机的方式。NRPE(nagios remote plugin executor)是用于在远端服务器上运行检测命令的守护进程,它用于让nagios监控端基于安装的方式触发远程主机上的检测命令,并将检测结果输出至监控端。而其执行的开销远低于基于SSH的检测方式,而且检测过程不需要远程主机上的系统账号等信息,其安全性也高于SSH的检测方式。
1.创建nagios用户

    useradd -s /sbin/nologin nagios

2.安装nagios-plugins插件,因为nrpe依赖此插件

yum -y install gcc gcc-c++ make openssl openssl-devel
wget https://nagios-plugins.org/download/nagios-plugins-2.1.4.tar.gz
tar zxf nagios-plugins-2.1.4.tar.gz 
cd nagios-plugins-2.1.4
./configure --with-nagios-user=nagios --with-nagios-group=nagios

3.安装NRPE

wget https://jaist.dl.sourceforge.net/project/nagios/nrpe-3.x/nrpe-3.2.1.tar.gz
tar -zxvf nrpe-3.2.1.tar.gz
cd nrpe-3.2.1
./configure --with-nrpe-user=nagios \
> --with-nrpe-group=nagios \
> --with-nagios-user=nagios \
> --with-nagios-group=nagios \
> --enable-command-args \
> --enable-ssl
make all
make install-plugin
make install-daemon
make install-config

(注:最后这里如果使用了3.X.X的版本的话,用这命令# make install-config,如果是2.X.X的版本使用#make install-daemon-config)
4.配置NRPE

log_facility=daemon
pid_file=/var/run/nrpe.pid_file
server_address=本地IP
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=服务器IP
command_timeout=60
connection_timeout=300
debug=0

5.启动NRPE

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

三. 服务端NRPE的安装
1.安装nrpe

wget https://jaist.dl.sourceforge.net/project/nagios/nrpe-3.x/nrpe-3.2.1.tar.gz
tar -zxvf nrpe-3.2.1.tar.gz 
cd nrpe-3.2.1
./configure --with-nrpe-user=nagios \
> --with-nrpe-group=nagios \
> --with-nagios-user=nagios \
> --with-nagios-group=nagios \
> --enable-command-args \
> --enable-ssl
make all
make install-plugin

会在nagios安装目录的libexec下生成check_nrpe的插件

cd /usr/local/nagios/libexec/
ll -d check_nrpe 

2.检测客户端连接状态,出现版本号即为正常(37服务器)

./check_nrpe -H 192.168.1.126
NRPE v3.2.1

(注:nagios监控主要是根据插件,在被监控服务期内配置好,在监控服务器上能够映射到服务器的配置)
3.定义命令

 cd /usr/local/nagios/etc/objects/
 vim commands.cfg

先查看是否已经存在,没有就在末尾行添加

define command{
    command_name    check_nrpe
    command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c "$ARG1$"
    }

4.定义服务(多余的define要删掉,否则会无法启动nagios)

[root@ecs-6221 objects]# cp windows.cfg linhost.cfg
[root@ecs-6221 objects]# vim linhost.cfg 
[root@ecs-6221 objects]# grep -v '^#' linhost.cfg | sed '/^$/d'
define host{
	use		linux-server	; Inherit default values from a template
	host_name	linhost		; The name we're giving to this host
	alias		192.168.1.126	; A longer name associated with the host
	address		192.168.1.126	; IP address of the host
	}
define service{
	use			generic-service
	host_name		linhost
	service_description	CHECK USER
	check_command		check_nrpe!check_users
	}
define service{
	use			generic-service
	host_name		linhost
	service_description	load
	check_command		check_nrpe!check_load
	}
define service{
	use			generic-service
	host_name		linhost
	service_description	SDA1
	check_command		check_nrpe!check_hda1
	}
define service{
	use			generic-service
	host_name		linhost
	service_description	Zombie
	check_command		check_nrpe!check_zombie_procs
	}
define service{
	use			generic-service
	host_name		linhost
	service_description	Total procs
	check_command		check_nrpe!check_total_procs
	}

5.启动所定义的命令和服务,增加linhost

vim /usr/local/nagios/etc/nagios.cfg

1
6.重启nagios服务

service nagios restart

四. 添加内存监控插件
1.在客户端创建check_mem文件

touch  /usr/local/nagios/libexec/check_mem
chmod  a+x /usr/local/nagios/libexec/check_mem

编辑文件内容为以下链接内容:

点击查看详细内容

- 测试 测试测试

  
    #! /usr/bin/perl -w
    #
    # $Id: check_mem.pl 8 2008-08-23 08:59:52Z rhomann $
    #
    # check_mem v1.7 plugin for nagios
    #
    # uses the output of `free` to find the percentage of memory used
    #
    # Copyright Notice: GPL
    #
    # History:
    # v1.8 Rouven Homann - 
    # + added findbin patch from Duane Toler
    # + added backward compatibility patch from Timour Ezeev
    #
    # v1.7 Ingo Lantschner - ingo AT boxbe DOT com
    # + adapted for systems with no swap (avoiding divison through 0)
    #
    # v1.6 Cedric Temple - cedric DOT temple AT cedrictemple DOT info
    # + add swap monitoring
    #       + if warning and critical threshold are 0, exit with OK
    #       + add a directive to exclude/include buffers
    #
    # v1.5 Rouven Homann - 
    # + perfomance tweak with free -mt (just one sub process started instead of 7)
    # + more code cleanup
    #
    # v1.4 Garrett Honeycutt - 
    # + Fixed PerfData output to adhere to standards and show crit/warn values
    #
    # v1.3 Rouven Homann - 
    #   + Memory installed, used and free displayed in verbose mode
    # + Bit Code Cleanup
    #
    # v1.2 Rouven Homann - 
    # + Bug fixed where verbose output was required (nrpe2)
    #       + Bug fixed where perfomance data was not displayed at verbose output
    # + FindBin Module used for the nagios plugin path of the utils.pm
    #
    # v1.1 Rouven Homann - 
    #     + Status Support (-c, -w)
    # + Syntax Help Informations (-h)
    #       + Version Informations Output (-V)
    # + Verbose Output (-v)
    #       + Better Error Code Output (as described in plugin guideline)
    #
    # v1.0 Garrett Honeycutt - 
    #   + Initial Release
    #
    use strict;
    use FindBin;
    FindBin::again();
    use lib $FindBin::Bin;
    use utils qw($TIMEOUT %ERRORS &print_revision &support);
    use vars qw($PROGNAME $PROGVER);
    use Getopt::Long;
    use vars qw($opt_V $opt_h $verbose $opt_w $opt_c);
    $PROGNAME = "check_mem";
    $PROGVER = "1.8";
    my $DONT_INCLUDE_BUFFERS = 0;
    sub print_help ();
    sub print_usage ();
    Getopt::Long::Configure('bundling');
    GetOptions ("V"   => \$opt_V, "version"    => \$opt_V,
    "h"   => \$opt_h, "help"       => \$opt_h,
            "v" => \$verbose, "verbose" => \$verbose,
    "w=s" => \$opt_w, "warning=s" => \$opt_w,
    "c=s" => \$opt_c, "critical=s" => \$opt_c);
    if ($opt_V)
    {print_revision($PROGNAME,'$Revision: '.$PROGVER.' $');
    exit $ERRORS{'UNKNOWN'};
    }
    if ($opt_h) {
    print_help();
    exit $ERRORS{'UNKNOWN'};
    }
    print_usage() unless (($opt_c) && ($opt_w));
    my ($mem_critical, $swap_critical);
    my ($mem_warning, $swap_warning);
    ($mem_critical, $swap_critical) = ($1,$2) if ($opt_c =~ /([0-9]+)[%]?(?:,([0-9]+)[%]?)?/);
    ($mem_warning, $swap_warning)   = ($1,$2) if ($opt_w =~ /([0-9]+)[%]?(?:,([0-9]+)[%]?)?/);
    # Check if swap params were supplied
    $swap_critical ||= 100;
    $swap_warning ||= 100;
    # print threshold in output message
    my $mem_threshold_output = " (";
    my $swap_threshold_output = " (";
    if ( $mem_warning > 0 && $mem_critical > 0) {
    $mem_threshold_output .= "W> $mem_warning, C> $mem_critical";
    }
    elsif ( $mem_warning > 0 ) {
    $mem_threshold_output .= "W> $mem_warning";
    }
    elsif ( $mem_critical > 0 ) {
    $mem_threshold_output .= "C> $mem_critical";
    }
    if ( $swap_warning > 0 && $swap_critical > 0) {
    $swap_threshold_output .= "W> $swap_warning, C> $swap_critical";
    }
    elsif ( $swap_warning > 0 ) {
    $swap_threshold_output .= "W> $swap_warning";
    }
    elsif ( $swap_critical > 0 ) {
    $swap_threshold_output .= "C> $swap_critical";
    }
    $mem_threshold_output .= ")";
    $swap_threshold_output .= ")";
    my $verbose = $verbose;
    my ($mem_percent, $mem_total, $mem_used, $swap_percent, $swap_total, $swap_used) = &sys_stats();
    my $free_mem = $mem_total - $mem_used;
    my $free_swap = $swap_total - $swap_used;
    # set output message
    my $output = "Memory Usage".$mem_threshold_output.": ". $mem_percent.'% 
'; $output .= "Swap Usage".$swap_threshold_output.": ". $swap_percent.'%'; # set verbose output message my $verbose_output = "Memory Usage:".$mem_threshold_output.": ". $mem_percent.'% '."- Total: $mem_total MB, used: $mem_used MB, free: $free_mem MB
"; $verbose_output .= "Swap Usage:".$swap_threshold_output.": ". $swap_percent.'% '."- Total: $swap_total MB, used: $swap_used MB, free: $free_swap MB
"; # set perfdata message my $perfdata_output = "MemUsed=$mem_percent\%;$mem_warning;$mem_critical"; $perfdata_output .= " SwapUsed=$swap_percent\%;$swap_warning;$swap_critical"; # if threshold are 0, exit with OK if ( $mem_warning == 0 ) { $mem_warning = 101 }; if ( $swap_warning == 0 ) { $swap_warning = 101 }; if ( $mem_critical == 0 ) { $mem_critical = 101 }; if ( $swap_critical == 0 ) { $swap_critical = 101 }; if ($mem_percent>$mem_critical || $swap_percent>$swap_critical) { if ($verbose) { print "CRITICAL: ".$verbose_output."|".$perfdata_output."\n";} else { print "CRITICAL: ".$output."|".$perfdata_output."\n";} exit $ERRORS{'CRITICAL'}; } elsif ($mem_percent>$mem_warning || $swap_percent>$swap_warning) { if ($verbose) { print "WARNING: ".$verbose_output."|".$perfdata_output."\n";} else { print "WARNING: ".$output."|".$perfdata_output."\n";} exit $ERRORS{'WARNING'}; } else { if ($verbose) { print "OK: ".$verbose_output."|".$perfdata_output."\n";} else { print "OK: ".$output."|".$perfdata_output."\n";} exit $ERRORS{'OK'}; } sub sys_stats { my @memory = split(" ", `free -mt`); my $mem_total = $memory[7]; my $mem_used; if ( $DONT_INCLUDE_BUFFERS) { $mem_used = $memory[15]; } else { $mem_used = $memory[8];} my $swap_total = $memory[18]; my $swap_used = $memory[19]; my $mem_percent = ($mem_used / $mem_total) * 100; my $swap_percent; if ($swap_total == 0) { $swap_percent = 0; } else { $swap_percent = ($swap_used / $swap_total) * 100; } return (sprintf("%.0f",$mem_percent),$mem_total,$mem_used, sprintf("%.0f",$swap_percent),$swap_total,$swap_used); } sub print_usage () { print "Usage: $PROGNAME -w -c [-v] [-h]\n"; exit $ERRORS{'UNKNOWN'} unless ($opt_h); } sub print_help () { print_revision($PROGNAME,'$Revision: '.$PROGVER.' $'); print "Copyright (c) 2005 Garrett Honeycutt/Rouven Homann/Cedric Temple\n"; print "\n"; print_usage(); print "\n"; print "-w , = Memory and Swap usage to activate a warning message (eg: -w 90,25 ) .\n"; print "-c , = Memory and Swap usage to activate a critical message (eg: -c 95,50 ).\n"; print "-v = Verbose Output.\n"; print "-h = This screen.\n\n"; support(); }
2.创建完成后可以执行check_mem -w 80 -c 90 测试可用性
    /usr/local/nagios/libexec/check_mem -w 80 -c 90
    OK: Memory Usage (W> 80, C> 90): 37% <br>Swap Usage (W> 100, C> 100): 0%|MemUsed=37%;80;90 SwapUsed=0%;100;100

3.在客户端/usr/local/nagios/etc/nrpe.conf内添加内存监控命令

vim /usr/local/nagios/etc/nrpe.cfg
command[check_mem]=/usr/local/nagios/libexec/check_mem -w 80 -c 90 

4.重启客户端nrpe服务 5.服务器端修改commands.cfg文件,

vi /usr/local/nagios/etc/objects/commands.cfg
#添加如下内容:
define command{
        command_name    check_mem
        command_line    $USER1$/check_mem -u -w $ARG1$ -c $ARG2$
        }

6.服务器端添加客户端host1监控内存服务,在services.cfg文件内添加.

vi /usr/local/nagios/etc/objects/services.cfg

define  service {
        use                     local-service,service-pnp
        host_name               host1
        service_description     memory
        check_command           check_nrpe!check_mem
        }

7.重启Nagios服务

/etc/init.d/nagios  restart