Skip to content
This repository has been archived by the owner on Jun 12, 2020. It is now read-only.

Multiple Clustering Indexes

prohaska edited this page Sep 24, 2014 · 46 revisions

Feature Summary

Covering indexes can result in orders of magnitude performance improvements for queries. Bradley's presentation on covering indexes describes what a covering key is, how it can effect performance, and why it works. A secondary index in MySQL typically includes the key columns and the primary key columns. A query that uses a secondary index of this type is a covering query if all of the columns needed by the query are in the secondary or primary key. Otherwise, it is an uncovered query. The performance of covered queries is much better than uncovered queries because covered queries never need to do hidden lookups on the primary index to get the uncovered columns.

The definition of covering indexes can get cumbersome since MySQL limits the number of columns in a key to 16 (32 on MariaDB). Tokutek introduced multiple clustering indexes into MySQL to address these problems. A clustering index is an index where all of the columns for a row are located close to the key for the row. So, when one retrieves the key, one also gets the other columns. Hidden lookups on the primary index are not necessary since all of the columns are clustered with the key.

Zardosht describes the multiple clustering indexes feature and how clustering indexes differ from covering indexes. Zardosht also describes the query versus update tradeoffs that exist with clustering indexes.

Tokutek's user interface for clustering indexes

Tokutek added the clustering attribute to the key definition and extended the MySQL grammar to use it. Tokutek also tweaked the query optimizer to understand that clustering indexes cover all of the columns defined in the table.

Here is how a TokuDB table is created with a secondary clustering index.

mysql> create table t (a int, b int, c int, key(a), clustering key(b)) engine=tokudb;

mysql> show create table t;
CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  KEY `a` (`a`),543
  CLUSTERING KEY `b` (`b`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1

Here is how a clustering index is added to an existing TokuDB table.

mysql> create clustering index 'c' on t(c);543543

mysql> show create table t;
CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  KEY `a` (`a`),
  CLUSTERING KEY `b` (`b`),
  CLUSTERING KEY `c` (`c`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1

Percona Server's user interface for clustering indexes

The clustering key user interface in Percona Server has the same grammar as Tokutek's so the user interface is the same. However, Percona decided to not allow secondary clustering indexes to be created on InnoDB, which does not support them yet. In contrast, Tokutek's implementation silently ignores the clustering keywords for InnoDB. In fact the examples in the previous section were captured on Percona Server 5.6.

Percona Server will not create an InnoDB table with a secondary clustering index.

mysql> create table t (a int, b int, c int, key(a), clustering key(b)) engine=innodb;
ERROR 1478 (HY000): Table storage engine 'InnoDB' does not support the create option 'CLUSTERING'

Percona Server will not add a secondary clustering index to an InnoDB table because InnoDB does not support it.

mysql> create table t (a int, b int, c int, key(a)) engine=innodb;
Query OK, 0 rows affected (0.13 sec)

mysql> create clustering index b on t(b);
ERROR 1478 (HY000): Table storage engine 'InnoDB' does not support the create option 'CLUSTERING'

MariaDB's user interface for clustering indexes

When the MariaDB engineers ported TokuDB to MariaDB, they decided not to extend the grammar to support the clustering attribute, but rather to have TokuDB define a clustering index attribute. Index attributes is a feature specific to MariaDB. As a consequence, the user interface for clustering indexes in MariaDB's port of TokuDB is different than Tokutek's.

Here is how a TokuDB table is created with a clustering index on 'b'.

MariaDB [test]> create table t (a int, b int, c int, key(a), key(b) clustering=yes) engine=tokudb;

Note that the clustering attribute for the index on 'b'.

MariaDB [test]> show create table t;
CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  KEY `a` (`a`),
  KEY `b` (`b`) `clustering`=yes
) ENGINE=TokuDB DEFAULT CHARSET=latin1

Here is how a clustering index on 'c' is added to an existing table.

MariaDB [test]> create index c on t(c) clustering=yes;

MariaDB [test]> show create table t;                                                                                                                                                                                                                              
CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  KEY `a` (`a`),
  KEY `b` (`b`) `clustering`=yes,
  KEY `c` (`c`) `clustering`=yes
) ENGINE=TokuDB DEFAULT CHARSET=latin1 `compression`='tokudb_zlib'

The parser accepts 'clustering=yes|no|1|0', but does not accept 'clustering=y|n|true|false'.

MariaDB [test]> create index c on t(c) clustering=y;
ERROR 1912 (HY000): Incorrect value 'y' for option 'clustering'

Portability issues

The meta-data to store Tokutek's implementation of multiple clustering indexes and Maria's implementation is different. This makes portability between the two systems problematic without a conversion tool. I like Maria's implementation of index and table options as it allows the storage engine to define new capabilities without hacks to the common MySQL or MariaDB code. It would be nice if MySQL would also support this feature.

Blogs about clustering indexes

Clone this wiki locally