Labels

Java (8) MySQL (5) linux (4) INNODB_LOCKS (2) INNODB_TRX (2) exclusive (2) innoDB (2) mountrakis (2) processlist (2) transactions (2) x-locks (2) xlocks (2) ERROR 1045 (28000) (1) J2EE (1) Jboss (1) Jboss5.0 (1) Jconsole (1) Master (1) Slave (1) Synchronization (1) VPN (1) bash (1) binlogs (1) command line (1) cook-book (1) cookbook (1) date (1) for (1) handy commands (1) loop (1) michael mountrakis (1) replication (1) sh (1) shell (1) unix epoch (1) unix timestamp (1) while (1)
Showing posts with label Synchronization. Show all posts
Showing posts with label Synchronization. Show all posts

Sunday, 24 November 2013

MySQL Master Slave Synchronization

In MySQL when Master/slave synchronization brakes, we can start the debugging with show slave status command like the following example illustrates:

[root@my-db-slave ~]# echo "show slave status\G" | mysql -u root -ppass
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: my-db-master.mydomain.com
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000522
          Read_Master_Log_Pos: 50224067
               Relay_Log_File: relay.000005
                Relay_Log_Pos: 10328614
        Relay_Master_Log_File: binlog.000520
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table mydatabase.example_table; Can't find record in 'example_table', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000520, end_log_pos 10330060
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 10328471
              Relay_Log_Space: 259941828
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Delete_rows event on table mydatabase.example_table; Can't find record in 'example_table', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000520, end_log_pos 10330060
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 49


On Slave MYSQL_HOME/logs/mysql.err this can be shown as:

131108 18:22:55 [Note] Slave I/O thread: connected to master 'replication@my-db-master.mydomain.com:3306',replication started in log 'binlog.000522' at position 50224067
131109  2:26:19 [ERROR] Could not execute Delete_rows event on table mydatabase.example_table; Can't find record in 'example_table', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000520, end_log_pos 10330060, Error_code: 1032
131109  2:26:19 [Warning] Slave: Can't find record in 'alf_node' Error_code: 1032
131109  2:26:19 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000522' at position 50224067

There are several treatments for this issue that depend on what is the state of binlogs on master and how long the slave has been out of synchronization.
The following were tested on  MySQL 5.5.31 /  RedHat  2.6.32-358.2.1.el6.x86_64

The simple case : Restart Slave 


When synchronization fails between MASTER/SLAVE due to some error, normally the slave retries to reconnect after MASTER_CONNECT_RETRY seconds.
Slave will try to reconnect MASTER_RETRY_COUNT times.

In some cases, when synchronization lost within a small time frame and not many updates have been to the Master, then a simple "stop slave" / "start slave" may restore sync:

On Slave:
[root@my-db-slave ~]# echo "stop slave" | mysql -u root -ppass
[root@my-db-slave ~]# echo "start slave" | mysql -u root -ppass
[root@my-db-slave ~]# echo "show slave status\G" | mysql -u root -ppass



Resetting the Slave (Ignore Master Updates )

Another treatment is to reset the Slave to follow the master in the future: doing this you are lossing all updates from master and you just want slave to ignore all changes happened to the master while slave was out of sync!

On Master
[root@my-db-master ~]#echo "SHOW MASTER STATUS" | mysql -uroot -ppass 

On Slave:
[root@my-db-slave ~]#Take from master the binlog and pos
[root@my-db-slave ~]#echo "slave stop"| mysql -uroot -ppass
[root@my-db-slave ~]#echo "CHANGE MASTER TO MASTER_HOST='my-db-master',MASTER_USER='replication', MASTER_PASSWORD='apassword', MASTER_LOG_FILE='binlog.001006', MASTER_LOG_POS=  102018129;"| mysql -uroot -ppass
[root@my-db-slave ~]#echo "start slave"| mysql -uroot -ppass
[root@my-db-slave ~]#echo "SHOW SLAVE STATUS\G" | mysql -uroot -ppass  


Forcing Slave to get binlog files from Master

Sometimes, it might help to force slave get the binlogs again from the master. This will work if master has not rotate its binlogs: in other words if master have not started writting the binlogs again. That may happened because for example someone issued a RESET MASTER  on master. To force SLAVE retrieve the binlogs from start do the following steps:

On Slave:
Stop slave & Database
[root@my-db-slave ~]#echo "slave stop"| mysql -uroot -ppass
[root@my-db-slave ~]#/etc/init.d/mysql stop

Be sure that you have enough space when you do the next step:
[root@my-db-slave ~]#rm -rf $MYSQL-HOME/relaylogs/* 
[root@my-db-slave ~]#rm -rf $MYSQL-HOME/mysqld-relay-bin.*  $MYSQL-HOME/mysqld-relay-bin.index  $MYSQL-HOME/relay-log.info
[root@my-db-slave ~]#/etc/init.d/mysql start
[root@my-db-slave ~]#echo "stop slave" | mysql -u root -ppass
[root@my-db-slave ~]#echo "reset slave" | mysql -u root -ppass
[root@my-db-slave ~]#echo "start slave" | mysql -u root -ppass



Restore Slave from clear export from Master

Use this when you cannot restore the slave with any of the above ways.


On Master:
I need only mydatabase database, do not care about the others. Even if mydatabase db is in usage from application clients:

Take the dump of current Master database
[root@my-db-master ~]# sudo -i
[root@my-db-master ~]# cd /dbexport/
[root@my-db-master ~]# nohup mysqldump --master-data  -u root  -ppass mydatabase > mydatabase.dump &

With the previous command mysql dumps mydatabase database to the mydatabase.dump by:
Reserving the master binlog  status at the moment of backup transaction
Locking each backuped table in mydatabase, so it corresponds to the binlog state.

Note that: with the previous command I export only one database instance : mydatabase
In case your server hosts more that one databases and those are replicated on different slaves, you should consider taking an export for them also.

Check the last command output and also the dump status:
[root@my-db-master ~]# head mydatabase.dump
[root@my-db-master ~]# tail mydatabase.dump

Put a new password for trasport the dump
[root@my-db-master ~]# passwd

On Slave:
[root@my-db-slave ~]# sudo -i
[root@my-db-slave ~]# cd /monsoon/dbexport/
[root@my-db-slave ~]# scp root@my-db-master:/dbexport/mydatabase.dump .

[root@my-db-slave ~]# echo "show slave status\G" |  mysql -u root -ppass | egrep "Slave_IO_Running|Slave_SQL_Running"

             Slave_IO_Running: No
            Slave_SQL_Running: No

[root@my-db-slave ~]# nohup mysql  -u root  -ppass mydatabase < mydatabase.dump &


[root@my-db-slave ~]#  cat nohup.out

--there must be nothing here--

[root@my-db-slave ~]# echo "start slave" |  mysql -u root -ppass

[root@my-db-slave ~]# echo "show slave status\G" |  mysql -u root -ppass | egrep "Slave_IO_Running|Slave_SQL_Running"
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes