I have been using Nagios Log Server, and seem to be running across various issues with it. Most of it seems to be related to running out disk space where the elasticsearch indexes are stored. I highly recommend that you do not allow that to happen. Clear and concise documentation is sketchy for this flexible and powerful centralized log server. I have decided to post a few notes of things I have stumbled onto that help me to be able manage the process better.

I created several alerts, and I could get them to work by manually having the query executed in the Alerting tab. However, they did not seem to firing off at the Check Interval I had specified. None of them seem to be. Upon trying to resolve this issue, I discovered a couple troubleshooting tips to note for future reference. Note: In my case, none these revealed the cause of my issue. At lease, I don’t think they did. Nonetheless, here they are.

Check the poller:

[nagios]$ /usr/bin/php /var/www/html/nagioslogserver/www/index.php poller
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Updating Cluster Hosts File
Updating Elasticsearch with instance…
Finished Polling.

Check the jobs:

[nagios]$ /usr/bin/php /var/www/html/nagioslogserver/www/index.php jobs
Processed 0 node jobs.
Processed 0 global jobs.

Look for an error in the cron log:

# grep ERROR /var/log/cron

What fixed my issue was going into the Administration tab and selecting “Command Subsystem” on the left side. From there, clicking “Reset All Jobs” resolved my issue.

Also, in the Administration tab, if you select “Audit Reports” you can some verification that the alerts are running. Before, resetting all the jobs, it was clear that were not running. After reseting, I see several regular scheduled entries regarding the returned messages from the alerts.

Another thing I was able to put together was report. In particular, I was looking to create a daily report of all IP addresses that made an attempt to login to one an externally facing server. I did this by creating an elasticsearch alert query, and then tweaking the time. I discovered that I could copy the query and execute it in a shell script using the curl command. Now, I had found references to people doing this, but they were using a curl switch of -XGET. This never worked for me, but -XPOST did and has been for a quite a while. Once you have the query copied from the dashboard query, you just need to past into a file and change the -XGET to -XPOST. Make the file executable, and then run it to get the text output. I wrapped some bash code around the query and formatted the output to create a report. Could be very useful.

This one really frustrated me and I am still not sure I am doing it right, but it seems to be working. As I said earlier, I kept running out of space. My index space was too large, so I just wanted to purge/delete the old ones to conserve space. Nothing in the UI seemed to work. The repositories in the Administration tab under “Backup & Maintenance” seem finicky and sensitive at best. Again, not really easy to find, I discovered some information about the curator command for elasticsearch. I used this with some parameters to effectively manage my index retention. I run these commands as user nagios as nightly job in cron:

To create a snapshot and save to your backup repository:

curator snapshot –repository nameofbackuprepository indices –older-than numberofdaystokeep –time-unit days –timestring %Y.%m.%d

To close the indices:

curator close indices –older-than numberofdaystokeep –time-unit days –timestring %Y.%m.%d

To delete the indices:

curator delete indices –older-than numberofdaystokeep –time-unit days –timestring %Y.%m.%d

To list your repositories via command line:

curl -XGET “localhost:9200/_snapshot?pretty”

To force the backup to run and create a snapshot:

curator snapshot –repository “RepositoryName” indices –all-indices

There are switches to curator command that you can use to get more verbose output and send that output to a log file:


–loglevel level

Level options available, found on the Elastic site (https://www.elastic.co/guide/en/elasticsearch/client/curator/current/configfile.html):

CRITICAL will only display critical messages.
ERROR will only display error and critical messages.
WARN will display error, warning, and critical messages.
INFO will display informational, error, warning, and critical messages.
DEBUG will display debug messages, in addition to all of the above.

Capture output:

–logfile /tmp/test_backup.txt