MySQL performance, using case insensitive columns

Fast_train_(4712207733)When it comes to matching in queries, if you don't need to do case insensitive comparison (and not search) you should definitely consider using binary columns. They're much faster. Let's just jump right into benchmarks, then I'll talk more about why this is good and scenarios where it can be done (using MySQL 5.6.17 x64 Linux).

In this example, I use utf8mb4, which is what you probably should use if you wish to have better Unicode support (learn more here). Keep in mind though that the character set can be essentially anything, and these results are always in favor of comparing against case sensitive (or binary) columns.

Case Sensitive
SELECT BENCHMARK(500000000, _utf8mb4'foo' < _utf8mb4'zot' COLLATE utf8mb4_bin);

Result

1 row in set (5.88 sec)
Case Insensitive
SELECT BENCHMARK(500000000, _utf8mb4'foo' < _utf8mb4'zot' COLLATE utf8mb4_unicode_ci);

Result

1 row in set (16.27 sec)

As you can see, though, case sensitive was nearly three (2.77) times faster than case insensitive.

My suggestion to you is that unless you need the column to be case insensitive for comparison or search, you should certainly consider using the case insensitive character set, for example utf8_unicode_ci would be utf8_bin and utf8mb4_unicode_ci would be utf8mb4_bin.

For CHAR and VARCHAR other character sets also have case insensitive versions, and if not you can alter the column to be VARCHAR BINARY:

ALTER TABLE `mytable1` CHANGE `mycolumn` `mycolumn` VARCHAR(254) BINARY NOT NULL;

However, I don't think that's ideal, instead use VARBINARY, and in fact here's a little chart for which to change to:

CHAR BINARY
VARCHAR VARBINARY
TEXT BLOB
MEDIUMTEXT MEDIUMBLOB
LONGTEXT LONGBLOB

If you're not using fulltext search on MyISAM you might just want to consider switching your TEXT columns to BLOB anyway to avoid character set issues with Unicode.

Properly escaping MySQL queries in PHP

I'm on various boards and such and from time to time people run into issues where they're trying to insert something into MySQL via a raw query and they inevitably run into that pesky apostrophe and the query dies.

Then almost always someone comes along to tell them that they need to use addslashes().

This is wrong.

Ideally you really want to use prepared statements (mysqli and PDO extensions), but let's assume for now you're throwing caution to the wind and you're going to do it the old fashioned way.

If you're using the mysql extention, you should use mysql_real_escape_string() around all of your variables which are not cast as integers. But actually, you shouldn't be using this function because mysql_* is deprecated, way deprecated. Instead you should be using…

mysqli which is faster, better, sexier, everything you want in a wom… extension. In this case we have the more logical name mysqli_escape_string() or you can use the back-to-goofiness-again method in the mysqli class $mysqli::real_escape_string() and it works the same way.

One issue is that with both of the above functions you have to actually be connected to the server to use them, that's because it escapes based on your connection chartype and some other stuff.

However assuming you're not too worried about potential unicode issues (I've yet to have any, supporting Serbian and Hungarian) you can always make your own function to escape based on what MySQL requires:

function escape($string) {
    return str_replace(array('\\', "\0", "\n", "\r", "'", '"', "\x1a"),
                       array('\\\\', '\\0', '\\n', '\\r', "\\'", '\\"', '\\Z'), $string);
}

But there's always a potential danger in doing things yourself and I actually don't have proof the above is faster than the connection required escape functions, so just use prepared statements ideally.

What are HIPAA's encryption requirements?

There's a lot of assumptions about what HIPAA states when it comes to encryption, be it over the wire, files, whatever. The fact is that HIPAA makes absolutely no requirements for encryption*, just that if there's reasonable risk, it must have encryption. What kind of encryption? What sort of strength? It does not specify*.

So to break it down:

  • Does HIPAA require encryption? No, unless there's a reasonable risk something could be read, as in over a network or what have you
  • What sort of encryption does HIPAA require? Essentially anything.

My suggestions though are:

  • You should use encryption in as many places as possible, especially if devices are storing information, almost all HIPAA data violations come from people losing laptops or whatever and the drives aren't encrypted. You can use something like TrueCrypt or even Windows EFS.
  • I suggest PGP since it's so widely implemented and available, and SSL for networks, etc since again, implementation is widely available. Where not available you can tunnel over things such as encrypted VPN connections as well.

* Source: HIPAA 45 CFR § 164.312(a)(2)(iv) and (e)(2)(ii).

By the way: IANAL/TINLA

So, what the hell is type casting anyway?

Casting is a way to take a liquid and mold it int… oh yeah

So casting is just a fancy way to refer to type conversion, that is where you change the "type" of a variable from one thing to another. For example changing a string to an integer.

How about some examples? Is that what you want?

OK, fine, you talked me into it. Here are some PHP examples:

$var = '1000';
var_dump($var); // Returns string(4) "1000"

$var = '1000';
$var = (int)$var; // Here is where we cast it
var_dump($var); // Returns int(1000)

So, who cares? What's the point?

Well, depending on what you're wanting to do, it's important to change the type, and this is especially true in languages where there is no dynamic typing (like C#) and it's still useful in languages with dynamic typing like PHP, because it allows for one to avoid potential issues with mathematics, concatenation, etc. Aside from math related things, in PHP I use (int) a lot to clean up variables for SQL queries for both safety and also so MySQL doesn't have to convert the types itself.

You can learn more about type casting in PHP specifically and why it's a great way to do certain things here: Casting int faster than intval in PHP.

A terribly uninformative guide to cross connect boxes, RTUs, SLCs

For some reason beyond me, in all of the years I've messed around with phones, be it working with them or doing things as a hobby, I can never remember the name for this:

MVC-904Fb

So I wanted to create this post so in the future when I go brain dead I don't have to go search around for it for 10 minutes. It breaks down like this:

  • Cross Box or Cross-Connect Box – Most common name, but in Ma Bell terms it usually refers to ones slightly smaller which contain jumpers from customer to the central office. These aren't to be confused with VRAD which are similar, but smaller, and often sometimes next to cross boxes or remote terminals.
  • Remote Terminal – Also sometimes called this, and so-called because they were like a tiny remote central office. These, unlike regular cross boxes, have slots for cards of what kind of circuit was being installed. For example a SDN line would have a card that takes two slots. Another sign of a Remote Terminal Unit versus a regular Cross Box is that RTUs have batteries in them. Some RTUs are bigger and are buried underground with sump-pumps and air conditioners and fancy stuff.

    Everyone loves slots
    Slots inside a Remote Terminal, a more modern one, smaller, at an industrial site.
  • SLC or SLC96 – Sometimes referred to as this, but these contained pair gain multiplex equipment in them. SLC itself stands for "subscriber loop carrier" or "subscriber line carrier". The 96 though refers to the fact it broke down into 96 lines.

And there we go, things I can never remember.

Thanks a lot to the phone woman for some extra insights into this post.