Sunday, January 4, 2015

Crash-safe Replication


If you're familiar with mysql-replication, you know that the replication information is stored in two files: master.info and relay-log.info

master.info
This file contain information about the connection to the master—such as hostname, user, and password—but also information about how much of the binary log that has been transferred to the slave.
relay-log.info
This file contain information about the current state of replication, that is, how much of the relay log that has been applied.

The update of these files are arranged so that they are updated after the transaction had been applied. This means that if you have a crash between the transaction commit and the update of the files, the replication progress information would be wrong.

Crash-safe masters

Two problems related to crash-safe replication has been fixed in the master, both of which could cause some annoyance when the master recovered.
  • If the master crashed when a binary log was rotated, it was possible that some orphan binlog files ended up in the binary log index file. This was fixed in 5.1 but is also a piece in the puzzle of having crash-safe replication.
  • Writing to the binary log is not an atomic operation, and if a crash occurred while writing to the binary log, there were a possibility of a partial event at the end of the binary log.Now, the master recovers from this by truncating the binary log to the last known good position, removing the partially written transaction and rolling back the outstanding transactions in the storage engines.

Crash-safe slaves

Several different solutions for implementing crash-safety—or transactional replication, as it is sometimes known as—have been proposed. The MySQL replication team decided to implement crash-safety by moving the replication progress information into system tables. This is a more flexible solution and has several advantages compared to storing the positions in the InnoDB transaction log:
  • If the replication information and data is stored in the same storage engine, it will allow both the data and the replication position to be updated as a single transaction, which means that it is crash-safe.
  • If the replication information and data is stored in different storage engines, but both support XA(eXtended Architecture[http://www.percona.com/live/mysql-conference-2013/sites/default/files/slides/XA_final.pdf]), they can still be committed as a single transaction.
  • The replication information is flushed to disk together with the transaction data. Hence writing the replication information directly to the InnoDB redo log does not offer a speed advantage, but does not prevent the user from reading the replication progress information easily.
  • The tables can be read from a normal session using SQL commands, which also means that it can be incorporated into such things as stored procedures and stored functions.

In order to make the solution flexible, we(MySQL) introduced a general API for adding replication information repositories. This means that we can support multiple types of repositories for replication information. In order to select what type of repository to use, two new options were added. These options are also available as server variables.
master_info_repository
The type of repository to use for the master info data.
relay_log_info_repository
The type of repository to use for the relay log info.

**Both of the variables can be set to either FILE or TABLE. If the variable is set to TABLE the new table-based system will be used and if it is set to FILE, the old file-based system will be used. The default is FILE, so make sure to set the value if you want to use the table-based system.


Selecting replication repository engine

As you know MyISAM is not very transactional, so it is necessary to set this to use InnoDB instead if you really want crash-safety. To change the engine for these tables you can just use a normal ALTER TABLE.
slave> ALTER TABLE mysql.slave_master_info ENGINE = InnoDB;
slave> ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;

Event processing

This implementation of crash-safe slaves work naturally with both statement-based and row-based replication and there is nothing special that needs to be done in the normal cases. However, these tables interleave with the normal processing in a little different ways.
To understand how transactions are processed by the SQL thread, let us consider the following example transaction:
START TRANSACTION;
INSERT INTO articles(user, title, body)
      VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....');
UPDATE users SET articles = articles + 1 WHERE user_id = 4711;
COMMIT;
This transaction will be written to the binary log and then sent over to the slave and written to the relay log in the usual way. Once it is read from the relay log for execution, it will be executed as if an update statement where added to the end of the transaction, before the commit:
START TRANSACTION;
INSERT INTO articles(user, title, body)
      VALUE (4711, 'Taming the Higgs Boson using Clicker Training', '....');
UPDATE users SET articles = articles + 1 WHERE user_id = 4711;
UPDATE mysql.slave_relay_log_info
   SET Master_log_pos = @@Exec_Master_Log_Pos,
       Master_log_name = @@Relay_Master_Log_File,
       Relay_log_name = @@Relay_Log_File,
       Relay_log_pos = @@Relay_Log_Pos
COMMIT;
In this example, there is a number of pseudo-server variables (that is, they don't exist for real) that have the same name as the corresponding field in the result set from SHOW SLAVE STATUS. As you can see, the update of the position information is now inside the transcation and will be committed with the transaction, so if both articles and mysql.slave_relay_log_info are in the same transactional engine, they will be committed as a unit.

Since a commit to the table is expensive—in the same way as syncing a file to disk is expensive when using files as replication information repository—the updates of the slave_master_info table is not updated with each processed event. Depending on the value of sync_master_info there are a few alternatives.

If sync_master_info = 0
In this case, the slave_master_info table is just updated when the slave starts or stops (for any reason, including errors), if the relay log is rotated, or if you execute a CHANGE MASTER command.
If sync_master_info > 0
Then the slave_master_info table will be updated every sync_master_info event.
MySQL UUID (Universal Unique Identifier)-

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers
  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.
The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.
Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66