-
Notifications
You must be signed in to change notification settings - Fork 131
Replication with TokuDB
TokuDB uses the replication protocols provided by MySQL and MariaDB, so MySQL replication just works with TokuDB. However, TokuDB changed the replication implementation to speed up common replication performance problems.
Suppose that the throughput of write intensive applications on the master does not scale with the number of clients. MySQL uses a two phase commit (2PC) algorithm to synchronize the state of its binary log with the state of the tables that were written. Fsync's are used to ensure that the data is persistent. Prior to MySQL 5.6, the fsync of the binary log did NOT use a group commit algorithm to amortize the cost of the fsync over many transactions. Since fsync's are SLOW, the throughput of the master running MySQL 5.5 is limited by the fsync throughput on the binary log.
MySQL 5.6 (and MariaDB 5.5 and 10) use a group commit algorithm for the binary log. The storage engines involved in 2PC transactions need to support this algorithm, and the changes to TokuDB to support the binary log group commit algorithm are described in TokuDB's binary log group commit wiki.
This problem is solved with TokuDB 7.5.4.
Suppose that the slave lags behind the master because the slave hits an I/O bottleneck on its storage system. The slave replication code does point queries for write, delete, and update replication events even when the entire row image is in the binary log replication event. This is fine for update in place data structures like a B-tree. However, fractal tree messaging on TokuDB tables can be used to handle write, delete and update replication events to avoid the hidden point queries for these events. We call this feature Read Free Replication.
This problem is solved with TokuDB 7.5.0.
Suppose that the slave still lags behind the master even when read free replication is enabled. There are a lot of fsync's to support durability. Here are a couple of options to address these fsync's.
-
Run the slave with durability OFF. Turn off all of the fsync's. If the slave crashes, then restore the slave's data from a backup and resume replication from the binlog position at the time of the backup. This may make sense if the probability of a crash is low and the slave load can be picked up by some other nodes while the slave is being restored.
-
Run the slave with durability ON. There are extra fsync's to maintain the replication meta-data (master.info and relaylog.info). The meta-data should be implemented as a table in a transactional storage engine. The scalability of this technique needs to be investigated.
MySQL and MariaDB continue to improve the performance of the replication software. In MySQL 5.5, slave lags are sometimes caused by the slave's single threaded implementation. While the master can execute a lot of concurrent statements, all of these statements are funneled onto a single thread for execution on the slave. The slave can run out of CPU cycles. To address this problem, MySQL and MariaDB have made changes that allow multiple threads on the slave to process the incoming replication events.
The MySQL 5.6 multi-thread slave replication feature (MTS) executes events in parallel on the slave for different databases. Unfortunately, the database schema may have to change to accommodate this algorithm.
The MariaDB parallel replication is more intelligent. It schedules binlog events that were group committed together on the master to execute in parallel on the slave. The group commit on the master implies that the transactions were running in parallel on the master, so why not run them in parallel on the slave as well.
See MySQL 5.7 parallel replication for a description of where MySQL is taking replication.
Finally, TokuDB's read free replication works with parallel replication, so one can build really fast MySQL or MariaDB slaves.
Semisynchronous replication delays the transaction commit until at least one slave has executed its binlog events. Since this is outside of the scope of any storage engine, it should just work with TokuDB.
The following changes are needed to support TokuDB with Galera.
- TokuDB must log its writes, updates, and deletes with Galera. This is relatively simple as there is an example on MariaDB 10.
- TokuDB must support priority transactions so that the Galera applier gets precedence over local transactions. This feature needs to be designed.