Posts Tagged ‘nagios’

Nagios Log Server – notes

I have been using Nagios Log Server, and seem to be running across various issues with it. Most of it seems to be related to running out disk space where the elasticsearch indexes are stored. I highly recommend that you do not allow that to happen. Clear and concise documentation is sketchy for this flexible and powerful centralized log server. I have decided to post a few notes of things I have stumbled onto that help me to be able manage the process better.

I created several alerts, and I could get them to work by manually having the query executed in the Alerting tab. However, they did not seem to firing off at the Check Interval I had specified. None of them seem to be. Upon trying to resolve this issue, I discovered a couple troubleshooting tips to note for future reference. Note: In my case, none these revealed the cause of my issue. At lease, I don’t think they did. Nonetheless, here they are.

Check the poller:

[nagios]$ /usr/bin/php /var/www/html/nagioslogserver/www/index.php poller
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Finished Polling.

Check the jobs:

[nagios]$ /usr/bin/php /var/www/html/nagioslogserver/www/index.php jobs
Processed 0 node jobs.
Processed 0 global jobs.

Look for an error in the cron log:

# grep ERROR /var/log/cron

What fixed my issue was going into the Administration tab and selecting “Command Subsystem” on the left side. From there, clicking “Reset All Jobs” resolved my issue.

Also, in the Administration tab, if you select “Audit Reports” you can some verification that the alerts are running. Before, resetting all the jobs, it was clear that were not running. After reseting, I see several regular scheduled entries regarding the returned messages from the alerts.

Another thing I was able to put together was report. In particular, I was looking to create a daily report of all IP addresses that made an attempt to login to one an externally facing server. I did this by creating an elasticsearch alert query, and then tweaking the time. I discovered that I could copy the query and execute it in a shell script using the curl command. Now, I had found references to people doing this, but they were using a curl switch of -XGET. This never worked for me, but -XPOST did and has been for a quite a while. Once you have the query copied from the dashboard query, you just need to past into a file and change the -XGET to -XPOST. Make the file executable, and then run it to get the text output. I wrapped some bash code around the query and formatted the output to create a report. Could be very useful.

This one really frustrated me and I am still not sure I am doing it right, but it seems to be working. As I said earlier, I kept running out of space. My index space was too large, so I just wanted to purge/delete the old ones to conserve space. Nothing in the UI seemed to work. The repositories in the Administration tab under “Backup & Maintenance” seem finicky and sensitive at best. Again, not really easy to find, I discovered some information about the curator command for elasticsearch. I used this with some parameters to effectively manage my index retention. I run these commands as user nagios as nightly job in cron:

To create a snapshot and save to your backup repository:

curator snapshot –repository nameofbackuprepository indices –older-than numberofdaystokeep –time-unit days –timestring %Y.%m.%d

To close the indices:

curator close indices –older-than numberofdaystokeep –time-unit days –timestring %Y.%m.%d

To delete the indices:

curator delete indices –older-than numberofdaystokeep –time-unit days –timestring %Y.%m.%d

To list your repositories via command line:

curl -XGET “localhost:9200/_snapshot?pretty”

To force the backup to run and create a snapshot:

curator snapshot –repository “RepositoryName” indices –all-indices

There are switches to curator command that you can use to get more verbose output and send that output to a log file:

Verbosity:

–loglevel level

Level options available, found on the Elastic site (https://www.elastic.co/guide/en/elasticsearch/client/curator/current/configfile.html):

CRITICAL will only display critical messages.
ERROR will only display error and critical messages.
WARN will display error, warning, and critical messages.
INFO will display informational, error, warning, and critical messages.
DEBUG will display debug messages, in addition to all of the above.

Capture output:

–logfile /tmp/test_backup.txt

Nagios log date conversion

To convert the nagios date timestamp in the nagios.log to a standard time format, use:

perl -pe ‘s/(\d+)/localtime($1)/e’ /var/log/nagios3/nagios.log

Nagios installation procedures used on CentOS5.

Ensure that you have the following installed:
httpd, gcc, glibc, glibc-common, gd, gd-devel

Create accounts and groups.
useradd -m nagios
passwd nagios
groupadd nagcmd
usermod -a -G nagcmd nagios
usermod -a -G nagcmd apache

Build nagios from the source.
cd /usr/local/src
Download nagios and nagios-plugins from http://www.nagios.org/download/ to /usr/local/src.
tar -zxvf nagios-3.0.5.tar.gz
cd nagios-3.0.5
./configure –with-command-group=nagcmd
make all 2>&1 | tee MAKEALL.log
make install 2>&1 | tee MAKEINSTALL.log
make install-init 2>&1 | tee MAKEINSTALLINIT.log
make install-config 2>&1 | tee MAKEINSTALLCONFIG.log
make install-commandmode 2>&1 | tee MAKEINSTALLCOMMANDMODE.log
cd /usr/local/nagios/etc/objects/
cp -rp contacts.cfg contacts.cfg.orig
vi contacts.cfg
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
cd /usr/local/src/nagios-3.0.5
make install-webconf 2>&1 | tee MAKEINSTALLWEBCONF.log
service httpd restart
cd ..
tar -zxvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
./configure –with-nagios-user=nagios –with-nagios-group=nagios
make 2>&1 | tee MAKE.log
make install 2>&1 | tee MAKEINSTALL.log
chkconfig –add nagios
chkconfig nagios on

This is a great way to debug errors in your configuration files:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

service nagios start

I created configuration files based the template.cfg provided in the distribution in the objects directory.
cd /usr/local/nagios/etc/objects/

You have to make changes to your nagios.cfg file based on any new configuration files you created above.
vi ../nagios.cfg

Check your configuration:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

You have restart nagios whenever you make a change to the configuration files.
service nagios restart

Installed NSClient on a Windows 2003 server to monitor it in nagios.
http://files.nsclient.org/x-0.3.x/NSClient%2B%2B-Win32-0.3.5.msi
Edit nsc.ini. The file is pretty well documented.

The following is how I installed nrpe on a linux system to allow nagios to monitor it.
Install nrpe to allow nagios access to system status
cd /usr/local/src
wget http://internap.dl.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
useradd nagios
passwd nagios
wget http://superb-east.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz
tar zxvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
./configure 2>&1 | tee CONFIGURE.log
make 2>&1 | tee MAKE.log
make install 2>&1 | tee MAKEINSTALL.log
chown nagios.nagios /usr/local/nagios
chown -R nagios.nagios /usr/local/nagios/libexec/
cd ..
tar zxvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure 2>&1 | tee CONFIGURE.log
make all 2>&1 | tee MAKEALL.log
make install-plugin 2>&1 | tee MAKEPLUGIN.log
make install-daemon 2>&1 | tee MAKEDAEMON.log
make install-daemon-config 2>&1 | tee MAKEDAEMONCONFIG.log
make install-xinetd 2>&1 | tee MAKEXINETD.log
vi /etc/xinetd.d/nrpe

only_from = 127.0.0.1 192.168.0.3

vi /etc/services

nrpe 5666/tcp # NRPE

yum install xinetd
service xinetd start
netstat -at | grep nrpe

Verify nrpe is working:
/usr/local/nagios/libexec/check_nrpe -H localhost
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load

Add customized commands to support the machine
vi /usr/local/nagios/etc/nrpe.cfg

# Customized for this machine
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_hda2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda2
command[check_hdd1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hdd1
command[check_hdd2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hdd2
command[check_hdd5]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hdd5
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_mailq_timeout]=/usr/local/nagios/libexec/check_mailq -M postfix -w 5 -c 15
command[check_mailq]=/usr/local/nagios/libexec/check_mailq -w 10 -c 20
command[check_procs_named]=/usr/local/nagios/libexec/check_procs -C named -t 3 -w 1:1

Return top

INFORMATION