L'actu sur le net

- Contributions de l’équipe OSSA
- Toolinux
- Da Linux French Page
- Daily Daemon News
- Libroscope
- Linagora.com
- LinuxFrench.Net
- LogicielLibre.Net
- PHP

Articles populaires

- [Manuel] Introduction à Cacti
- [Tutoriel] Ajout d’un script dans les commandes Nagios
- [Nagios] Surveillance des disques RAID sous Linux
- [OpenLDAP] Start/stop script
- Fichiers de configurations Samba
- JMX (Java Management eXtensions)
- [OpenLDAP] Script de démarrage et d’arrêt
- [Nagios] Supervision of OpenLDAP’s replication status

 © Linagora.com

Accueil > Contributions > Nagios > [Nagios] Supervision of OpenLDAP’s replication status
OpenLDAP monitoring

[Nagios] Supervision of OpenLDAP’s replication status

Forget the SUN java console...

Introduction

The script slurpd_status.pl anlayses the files used by slurpd, the OpenLDAP’s replication server. A few explanations of this process are necessary to understand how the monitoring perl script works:

Replication’s mechanism enables the propagation of modifications from a directory, named "master", to another directory, named "slave". So the master directory is the reference and groups all write operations, then these modifications reach one or many slave directories.

- First Step: a write operation is performed on the master directory (1A). slapd process saves the modification in his database (1B) and registers it in his replog file (1C).

- Second Step: regularly slurpd process scans slapd’s replog file (2A) in order to detect modifications on standby. When it finds some entries, they are moved to his replog file (2B) (and so deleted from slapd’s replog file (2A)).

- Third Step: slurpd’s process browses his replog file and tries to propagate the modifications to the aimed slave directory. If the remote directory can’t be reached or if the couple user/password for the replication agreement is incorrect, the modification stays in slurpd’s replog file and no furthur action is taken. But if the connection to the slave works (3A), the entry is sent. If the remote directory accepts the modification (3B), the status file is updated with the time and the range of the entry (3C), else the reject file (specific to the aimed slave directory) is filled with the entry and the error message returned by the slave (3E), and the status file is also updated with the time and the range of the entry (3D).

- Regularly: slurpd’s process reads the time and the range written in status file and attempts to send to the slave directory all the entries in his replog file that have a bigger time and range in comparaison of those found in the status file. Most of the time these are untransmitted entries because the connection was refused by the remote directory. Therefore a rejected entry (present in the reject file) is never repropagated by slurpd. Moreover the older entries (already sent and well received) are deleted from the slurpd’s replog file.

OpenLDAP's replication operation

In summary, there are 4 important files to analyse in order to know the replication status:

- slapd’s replog file: it contains saved modifications on the master directory, but not picked up by slurpd.

- slurpd’s replog file: it contains the modifications that were in the slapd’s replog file and picked up by slurpd.

- status file: each line of this file contains the address and the port of a slave directory, and also the time and the range of the last accepted or rejected file. All these informations are physicaly separated by :.

- reject file: this file is specific to a slave (so there are as many file as slaves). It contains the rejected replication entries.

Modifications (also named replication entries) can be in 4 differents states:

- in transition: entries are in slapd’s replog file, so in transition between slapd and slurpd.

- waiting: entries are in slurpd’s replog file and have a time and range higher than those written in the status file. It means either that slurpd did not yet have the time to send these entrie sto the slave, or that the slave is unreachable.

- rejected: entries are ine the reject file. The error is written at the beginning of each entry.

- propagated: entries were accepted by the slave. They stay temporarly in slurpd’s replog file until slurpd erases them.

The monitoring script slurpd_status.pl counts the entries that are in one of the three first states. Ok, Warning or Critical alerts are sent according to parametred levels.

Requierements

The following Perl modules are used by the monitoring script, they have to be installed on the server which runs it:
- The module Getopt::Std
- The function max() of the module List::Util

The user running the monitoring script must have read access to the files described in the precedent part. It can be checked by running the script in debug mode.

Warning: setting read access on the concerned files can be insufficient, the user must have also the rights to browse all directories in which are the files (the execution bit must be set).

Usage

Here is the script usage:


./slurpd_status -w warning_level -c critical_level [-h hostname] [-p hostport] [-v]

Mandatory parameters are:
- warning_level and critical_level: these are the values above which a Warning and or Critical alert is sent. There is two ways to set them:

  • a list of three integers, comma separated, (eg 100,3,60). The first is for entries in transition, the second for rejected entries and the last for waiting entries.
  • an alone integer, which is the max of the three entries states, i.e. if entries in transition or rejected entries or waiting entries exceeds this value, an alert will be sent.

Optionals parameters are:
- hostname: IP or name of the slave, as written in OpenLDAP configuration, at the replica’s line of slapd.conf (by default). If not defined, the value of "localhost" will be used.
- hostport: port of the slave, as written in OpenLDAP configuration, at the replica’s line of slapd.conf (by default). If not defined, the value of "0" will be used: indeed, when slurpd sees that no port is defined for the slave, it uses the value "0" in his own files.
- Option -v enables the debug mode. It prints a lot of messages, so must not be set when the script is called by Nagios.

Example:


./sluprd_status.pl -w 30,2,10 -c 100,5,30 -h slave.linagora.com -p 389

Notice: the OpenLDAP configuration file would contain:


replica host=slave.linagora.com:389
       binddn="uid=replicateur,o=linagora,c=com"
       bindmethod=simple
       credentials=secret

Internal configuration

If internal parameters are not modified, the script can not run properly. The script has to be edited to change them:
- $slurpd_tempdir: directory in which slurpd saves his files. In a default OpenLDAP installation, it is /var/openldap-slurp/replica/.
- $slapd_replog_file: the slapd replog file, as written in the OpenLDAP configuration file, at the line replogfile. In a default OpenLDAP installation, it is /var/replog.
- %code: hashtable of Nagios’s exit codes. Don’t touch it unless Nagios changes his mind.

Integration in Nagios

A tutorial describe manipulations to configure a script in Nagios.

As many services as slave servers must be defined in Nagios.

In order to run this script on remote hosts, use the command check_by_ssh or NRPE.

2005 (c) Clément OUDOT

Perl script - replace extention .txt by .pl after download

 Qui sommes nous ?

Dernière mise à jour : 28/03/2008
XHTML - SPIP 1.9.2