Crash-safe Replication
If you're familiar with mysql-replication, you know that the replication information is stored in two files:
master.info
and relay-log.info
. master.info
relay-log.info
The update of these files are arranged so that they are updated after the transaction had been applied. This means that if you have a crash between the transaction commit and the update of the files, the replication progress information would be wrong.
Crash-safe masters
Two problems related to crash-safe replication has been fixed in the master, both of which could cause some annoyance when the master recovered.
- If the master crashed when a binary log was rotated, it was possible that some orphan binlog files ended up in the binary log index file. This was fixed in 5.1 but is also a piece in the puzzle of having crash-safe replication.
- Writing to the binary log is not an atomic operation, and if a crash occurred while writing to the binary log, there were a possibility of a partial event at the end of the binary log.Now, the master recovers from this by truncating the binary log to the last known good position, removing the partially written transaction and rolling back the outstanding transactions in the storage engines.
Crash-safe slaves
Several different solutions for implementing crash-safety—or transactional replication, as it is sometimes known as—have been proposed. The MySQL replication team decided to implement crash-safety by moving the replication progress information into system tables. This is a more flexible solution and has several advantages compared to storing the positions in the InnoDB transaction log:- If the replication information and data is stored in the same storage engine, it will allow both the data and the replication position to be updated as a single transaction, which means that it is crash-safe.
- If the replication information and data is stored in different storage engines, but both support XA(eXtended Architecture[http://www.percona.com/live/mysql-conference-2013/sites/default/files/slides/XA_final.pdf]), they can still be committed as a single transaction.
- The replication information is flushed to disk together with the transaction data. Hence writing the replication information directly to the InnoDB redo log does not offer a speed advantage, but does not prevent the user from reading the replication progress information easily.
- The tables can be read from a normal session using SQL commands, which also means that it can be incorporated into such things as stored procedures and stored functions.
In order to make the solution flexible, we(MySQL) introduced a general API for adding replication information repositories. This means that we can support multiple types of repositories for replication information. In order to select what type of repository to use, two new options were added. These options are also available as server variables.
- master_info_repository
- The type of repository to use for the master info data.
- relay_log_info_repository
- The type of repository to use for the relay log info.
- If sync_master_info = 0
- In this case, the
slave_master_info
table is just updated when the slave starts or stops (for any reason, including errors), if the relay log is rotated, or if you execute aCHANGE MASTER
command. - If sync_master_info > 0
- Then the
slave_master_info
table will be updated every sync_master_info event.
FILE
or TABLE
. If the variable is set to TABLE
the new table-based system will be used and if it is set to FILE
, the old file-based system will be used. The default is FILE
, so make sure to set the value if you want to use the table-based system.
Selecting replication repository engine
As you know MyISAM is not very transactional, so it is necessary to set this to use InnoDB instead if you really want crash-safety. To change the engine for these tables you can just use a normal ALTER TABLE.
slave> ALTER TABLE mysql.slave_master_info ENGINE = InnoDB; slave> ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;
Event processing
This implementation of crash-safe slaves work naturally with both statement-based and row-based replication and there is nothing special that needs to be done in the normal cases. However, these tables interleave with the normal processing in a little different ways.To understand how transactions are processed by the SQL thread, let us consider the following example transaction:START TRANSACTION; INSERT INTO articles(user, title, body) VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....'); UPDATE users SET articles = articles + 1 WHERE user_id = 4711; COMMIT;This transaction will be written to the binary log and then sent over to the slave and written to the relay log in the usual way. Once it is read from the relay log for execution, it will be executed as if an update statement where added to the end of the transaction, before the commit:START TRANSACTION; INSERT INTO articles(user, title, body) VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....'); UPDATE users SET articles = articles + 1 WHERE user_id = 4711; UPDATE mysql.slave_relay_log_info SET Master_log_pos = @@Exec_Master_Log_Pos, Master_log_name = @@Relay_Master_Log_File, Relay_log_name = @@Relay_Log_File, Relay_log_pos = @@Relay_Log_Pos COMMIT;
In this example, there is a number of pseudo-server variables (that is, they don't exist for real) that have the same name as the corresponding field in the result set fromSHOW SLAVE STATUS
. As you can see, the update of the position information is now inside the transcation and will be committed with the transaction, so if botharticles
andmysql.slave_relay_log_info
are in the same transactional engine, they will be committed as a unit.
Since a commit to the table is expensive—in the same way as syncing a file to disk is expensive when using files as replication information repository—the updates of the slave_master_info
table is not updated with each processed event. Depending on the value of sync_master_info there are a few alternatives.