Attachment 'check_replication.pl'

Download

#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use DBI;

# check_replication.pl - from http://www.nagiosexchange.org/
#
# Changelog:
#  20070122 - Oliver Hookins, Anchor Systems Pty Ltd
#             Add code to check extra paranoia replication data between the
#             slave and master as per section 7.5.4.6 of High Performance
#             MySQL (0596003064). Removed the unwise --check-random-database
#             option which is useless.
#  20070112 - Oliver Hookins, Anchor Systems Pty Ltd
#             Change position delay behind master to be absolute number of
#             positions based on maximum size of binlog. Adjust crit and warn
#             parameters accordingly.
#           - Add DBI connect timeouts so we don't hang the Nagios process.
#
#  20050304 - Skip the row count on InnoDB tables, becuase the SHOW TABLE STATUS 
#             is just an ESTIMATE that varies wildly. Broke the check for 
#             this out to its own subroutine to clean up compare_status().
#  20050303 - Correct master-port handling (I dont use this now as I get the 
#             master figured out from the slave by doing 'show slave status').
#           - Added the master and slave ports to the printed output (I run 
#             many slaves on a slave host, one for each master I am slaving 
#             from; sometimes I have separate masters on the same host so that 
#             each one can be stopped, started independantly with their own 
#             keycache).
#             Added an option --check-random-database", which looks at the 
#             slave and sees which databases are being replicated, randomly 
#             picks one of these and does a 'show table status' ot get back the 
#             row_count, and if it differes by more than 
#             'table-rows-diff-absolute-crit' it adds this to the status line. 
#             If your replication is good, then the number of row between the 
#             master and slave should be about the same, and the update time on 
#             each table within a replicated database should be about the same.
#             The 'about' is because the time difference in checking the slave and 
#             then the master, and the delay in replication in the other direction.
#           - If there are table differences in the random check, then show a warning.
#
#  20050217 - Fix a type in the comments
#           - Convert the Seconds_Behind_Master to hours and seconds if it is large
#
#  20040120 - Make it find the master automatically, so you only specify a slave
#           - Update the output to show second behind for MySQL 4.1 slaves
#           
#  20041102 - Support MySQl 4.1 Exec_master_log_pos -> Exec_Master_Log_Pos case change

our $VERSION=0.03;

# $Id$

my $options = { 'slave-port' => 3306, 'slave' => 'slavehost', 'crit' => 200000, 'warn' => 100000 , 'slave-user' => 'repl', 'slave-pass' => 'password', 'debug' => 0, 'time-diff-absolute-crit' => 130, 'time-diff-absolute-warn' => 65};
GetOptions($options, "slave=s", "slave-user=s", "slave-pass=s", "master=s", "master-port=i", "master-user=s", "master-pass=s", "crit=i", "warn=i", "help", "slave-port=i", "debug=i", "version", "time-diff-absolute-crit=i", "time-diff-absolute-warn=i");
my $max_binlog;

if (defined $options->{'help'}) {
        print <<FOO;
$0: check replication between MySQL database instances

 check_replication.pl [ --slave <host> ] [ --slave-pass <pass> ] 
 [ --slave-port <d> ] [ --slave-user <user> ] [ --master <host> ] 
 [ --master-pass <pass> ] [ --master-port <port> ] [ --master-user <user> ] 
 [ --crit <positions> ] [ --warn <positions> ]
 [ --time-diff-absolute-crit <number> ]
 [ --time-diff-absolute-warn <number> ]

  --slave <host>        - MySQL instance running as a slave server
  --slave-port <d>      - port for the slave
  --slave-user <user>   - Username with File/Process/Super privs
  --slave-pass <pass>   - Password for above user
  --master <host>       - MySQL instance running as server (override)
  --master-port <d>     - port for the master (override)
  --master-user <user>  - Username for master (override)
  --master-pass <pass>  - Password for master
  --crit <positions>    - Number of positions delay behind master for critical state
  --warn <positions>    - Number of positions delay behind master for warning state
  --time-diff-absolute-crit <number> - Ensure that the change in row count
                        between master and slave in the 'replication' database
                        is below this threshold (seconds), and go critical if not
  --time-diff-absolute-warn <number> - Ensure that the change in row count
                        between master and slave in the 'replication' database
                        is below this threshold (seconds), and go warning if not
  --help             - This help page


By default, you should use your configured replication user, as you will 
then only need to specify the user and password once, and this script will 
find the master from the slave's running configuration.

Critical and warning values are measured as number of positions difference
behind the master. Unfortunately this means assuming that the binlogs on all
servers make it to the maximum size to handle situations where the master
rotates its logs and ends up on the next numbered binlog file for a short
period.

MySQL 3: GRANT File, Process on *.* TO repl\@192.168.0.% IDENTIFIED BY <pass>
MySQL 4: GRANT Super, Replication_client on *.* TO repl\@192.168.0.% IDE...

Note: Any mysqldump tables (for backups) may lock large tables for a long 
time. If you dump from your slave for this, then your master will gallop 
away from your slave, and the difference will become large. The trick is to 
set crit above this differnce and warn below.

(c) 2005 Fotango. James Bromberger <jbromberger\@fotango.com>.
FOO
exit;
} elsif (defined $options->{version}) {
        printf "%s %s\n", $0, $VERSION;
        exit;
}


sub debug {
        my $level = shift;
        my $message = shift;
        return if $level > $options->{debug};
        my $caller = (caller(1))[3];
        print $caller . ":" . $message . "\n";
}

sub get_status {
        my $host = shift;
        my $port = shift;

        debug(1, "Connecting to slave $host:$port as user " . $options->{'slave-user'});
        my $dbh = DBI->connect("DBI:mysql:host=$host:port=$port;mysql_connect_timeout=7", $options->{'slave-user'}, $options->{'slave-pass'});
        if (not $dbh) {
                print "UNKNOWN: cannot connect to $host";
                exit 3;
        }
        my $sql = "show variables";
        my $sth = $dbh->prepare($sql);
        if (not $sth) {
                print "UNKNOWN: cannot prepare $sql";
                exit 3;
        }
        debug(2, "Getting slave variables");
        my $res = $sth->execute;
        my $slave_data;
        while (my $ref = $sth->fetchrow_hashref) {
                $slave_data->{$ref->{'Variable_name'}} = $ref->{'Value'};
        }
        $sth->finish;
        $sql = "show slave status";
        $sth = $dbh->prepare($sql);
        if (not $sth) {
                print "UNKNOWN: cannot prepare $sql";
                exit 3;
        }
        debug(2, "Getting slave replication status");
        $res = $sth->execute();
        $slave_data->{replication} = $sth->fetchrow_hashref;
        $sth->finish;

        debug(2, "Checking status of replication database");

        $sql = "USE replication";
        $sth = $dbh->prepare($sql) || die "Cannot prepare: $!";
        $res = $sth->execute();
        $sth->finish;
        $sql = 'SELECT unix_time, db_time FROM heartbeat ORDER BY unix_time DESC LIMIT 1';
        $sth = $dbh->prepare($sql);
        $res = $sth->execute();

        while (my $ref = $sth->fetchrow_hashref) {
                $slave_data->{replication}->{data} = $ref;
        }

        $dbh->disconnect;

        # Now connect to the master...
        $host = $options->{'master'} || $slave_data->{replication}->{Master_Host};
        $port = $options->{'master-port'} || $slave_data->{replication}->{Master_Port};
        my $user = $options->{'master-user'} || $slave_data->{replication}->{Master_User};
        my $pass = $options->{'master-pass'} || $options->{'slave-pass'};
        debug(1, "Connecting to master $host:$port as user $user");
        $dbh = DBI->connect("DBI:mysql:host=$host:port=$port;mysql_connect_timeout=7", $user, $pass);

        if (not $dbh) {
                print STDERR "ERROR: $DBI::errstr, $DBI::err\n";
                print "UNKNOWN: Cannot connect to master $host:$port\n";
                die 3;
        }
        $sql = "show variables";
        $sth = $dbh->prepare($sql);
        debug(1, "Getting master variables");
        $res = $sth->execute;
        my $master_data;
        while (my $ref = $sth->fetchrow_hashref) {
                $master_data->{$ref->{'Variable_name'}} = $ref->{'Value'};
        }
        $sth->finish;
        $sql = "show master status";
        $sth = $dbh->prepare($sql);
        debug(2, "Getting master replication status");
        $res = $sth->execute;
        $master_data->{replication} = $sth->fetchrow_hashref;
        $sth->finish;

        $dbh->disconnect;

        #use Data::Dumper;
        #print Dumper($slave_data->{replication});
        #print Dumper($master_data->{replication});
        compare_status($master_data, $slave_data);
}

sub compare_status {
        my ($master, $slave) = @_;


        # Step one; are both the SQL and I/O slave threads running (critical if not)
        if (lc($slave->{replication}->{'Slave_SQL_Running'}) ne lc('yes')) {
                print "CRITICAL: Slave SQL thread not running\n";
                exit 2;
        }
        if (lc($slave->{replication}->{'Slave_IO_Running'}) ne lc('yes')) {
                print "CRITICAL: Slave IO thread not running\n";
                exit 2;
        }

        # Step two; compare the positions between the master and slave

        # Pattern match the BINLOG number...
        $master->{replication}->{'File_No'} = $1 if ($master->{replication}->{'File'} =~ /(\d+)$/);
        $slave->{replication}->{'File_No'} = $1 if ($slave->{replication}->{'Relay_Master_Log_File'} =~ /(\d+)$/);

        # Get the slave position it is executing, being careful of the 
        # key name change in MySQL 4.1 (case change)
        $slave->{replication}->{'Position'} = $slave->{replication}->{'Exec_Master_Log_Pos'} || $slave->{replication}->{'Exec_Master_log_pos'};
        #use Data::Dumper;
        #debug(4, Dumper($slave->{replication}));

        debug(3, " Master: " . $master->{replication}->{'File'} . ":" . $master->{replication}->{'Position'});
        debug(3, " Slave:  " . $slave->{replication}->{'Master_Log_File'} . ":" . $slave->{replication}->{'Position'});

        # Make an assumption that all of the log files involved reached their maximum size.
        # This will be incorrect when either the slave or master has rotated the log early (mysqld restart, mysqldump etc).
        my $diff = $master->{max_binlog_size} * ($master->{replication}->{'File_No'} - $slave->{replication}->{'File_No'}) + $master->{replication}->{'Position'} - $slave->{replication}->{'Position'};

        debug(1, "diff: $diff ");

        # Check 'time' difference between SQL threads
        my $time_diff = "";
        if (defined $slave->{'replication'}->{Seconds_Behind_Master}) {
                if ($slave->{'replication'}->{Seconds_Behind_Master}> 3600) {
                        $time_diff = int($slave->{'replication'}->{Seconds_Behind_Master} / 3600) . "h " . ($slave->{'replication'}->{Seconds_Behind_Master} % 3600) . " secs";
                } else {
                        $time_diff = $slave->{'replication'}->{Seconds_Behind_Master} . " secs between SQL threads";
                }
        }

        # Check timestamp from replication database against current time
        my $delay = time() -  $slave->{'replication'}->{'data'}->{'unix_time'};

        my $state = sprintf "%d positions behind master", $diff;
        $state.= ", $time_diff" if defined($slave->{'replication'}->{Seconds_Behind_Master});
        $state.= ", $delay secs delay from master";
        $state.= ", " .  ($options->{'master'} || $slave->{replication}->{Master_Host}) . ":" . ($options->{'master-port'} || $slave->{replication}->{Master_Port}) .  " (" . $master->{version} . ") -> " . $options->{slave} . ":" . $options->{'slave-port'} . " (" . $slave->{version} . ")";
        $state.= "\n";

        if ($diff >= $options->{'crit'} || $delay >= $options->{'time-diff-absolute-crit'} ) {
                print "CRITICAL: $state";
                exit 2;
        } elsif ($diff >= $options->{'warn'} || $delay >= $options->{'time-diff-absolute-warn'} ) {
                print "WARN: $state";
                exit 1;
        }
        print "OK: $state";
        exit 0;
}

get_status($options->{'slave'}, $options->{'slave-port'});

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2013-05-31 11:14:57, 8.0 KB) [[attachment:IPaddr2_vlan.sh]]
  • [get | view] (2013-05-31 11:14:58, 361.3 KB) [[attachment:LCA2008-talk.odp]]
  • [get | view] (2013-05-31 11:14:58, 832.2 KB) [[attachment:LCA2008-talk.pdf]]
  • [get | view] (2013-05-31 11:14:57, 3.7 KB) [[attachment:alter_mysql_slave]]
  • [get | view] (2013-05-31 11:14:57, 3.7 KB) [[attachment:alter_mysql_slave.sh]]
  • [get | view] (2013-05-31 11:14:57, 1.5 KB) [[attachment:authkeys]]
  • [get | view] (2013-05-31 11:14:57, 1.5 KB) [[attachment:authkeys.txt]]
  • [get | view] (2013-05-31 11:14:58, 1.1 KB) [[attachment:check-drbd.sh]]
  • [get | view] (2013-05-31 11:14:57, 11.4 KB) [[attachment:check_replication.pl]]
  • [get | view] (2013-05-31 11:14:57, 6.1 KB) [[attachment:cib.xml.template]]
  • [get | view] (2013-05-31 11:14:58, 8.0 KB) [[attachment:db-export-mysqlstandby.pl]]
  • [get | view] (2013-05-31 11:14:57, 3.8 KB) [[attachment:drbd.conf.txt]]
  • [get | view] (2013-05-31 11:14:58, 1.3 KB) [[attachment:drbddisk.sh]]
  • [get | view] (2013-05-31 11:14:57, 1.5 KB) [[attachment:ha.cf.txt]]
  • [get | view] (2013-05-31 11:14:58, 9.0 KB) [[attachment:init_mysql_slave.pl]]
  • [get | view] (2013-05-31 11:14:58, 1.2 KB) [[attachment:logd.cf.txt]]
  • [get | view] (2013-05-31 11:14:58, 2.2 KB) [[attachment:lvs-helper.sh]]
  • [get | view] (2013-05-31 11:14:57, 1.8 KB) [[attachment:my.cnf.txt]]
  • [get | view] (2013-05-31 11:14:57, 2.2 KB) [[attachment:myslave.cnf.txt]]
  • [get | view] (2013-05-31 11:14:57, 5.5 KB) [[attachment:mysqld.sh]]
  • [get | view] (2013-05-31 11:14:58, 2.0 KB) [[attachment:mysqlslave.cnf.txt]]
  • [get | view] (2013-05-31 11:14:57, 5.7 KB) [[attachment:mysqlslaved.sh]]
  • [get | view] (2013-05-31 11:14:57, 3.5 KB) [[attachment:write_heartbeat.pl]]

You are not allowed to attach a file to this page.