MySQL performance, using case insensitive columns

Fast_train_(4712207733)When it comes to matching in queries, if you don't need to do case insensitive comparison (and not search) you should definitely consider using binary columns. They're much faster. Let's just jump right into benchmarks, then I'll talk more about why this is good and scenarios where it can be done (using MySQL 5.6.17 x64 Linux).

In this example, I use utf8mb4, which is what you probably should use if you wish to have better Unicode support (learn more here). Keep in mind though that the character set can be essentially anything, and these results are always in favor of comparing against case sensitive (or binary) columns.

Case Sensitive
SELECT BENCHMARK(500000000, _utf8mb4'foo' < _utf8mb4'zot' COLLATE utf8mb4_bin);

Result

1 row in set (5.88 sec)
Case Insensitive
SELECT BENCHMARK(500000000, _utf8mb4'foo' < _utf8mb4'zot' COLLATE utf8mb4_unicode_ci);

Result

1 row in set (16.27 sec)

As you can see, though, case sensitive was nearly three (2.77) times faster than case insensitive.

My suggestion to you is that unless you need the column to be case insensitive for comparison or search, you should certainly consider using the case insensitive character set, for example utf8_unicode_ci would be utf8_bin and utf8mb4_unicode_ci would be utf8mb4_bin.

For CHAR and VARCHAR other character sets also have case insensitive versions, and if not you can alter the column to be VARCHAR BINARY:

ALTER TABLE `mytable1` CHANGE `mycolumn` `mycolumn` VARCHAR(254) BINARY NOT NULL;

However, I don't think that's ideal, instead use VARBINARY, and in fact here's a little chart for which to change to:

CHAR BINARY
VARCHAR VARBINARY
TEXT BLOB
MEDIUMTEXT MEDIUMBLOB
LONGTEXT LONGBLOB

If you're not using fulltext search on MyISAM you might just want to consider switching your TEXT columns to BLOB anyway to avoid character set issues with Unicode.

Leave a Reply