PHP 7 replacement for xdebug tracing

I started working with PHP 7.0.0-dev and, at least at the time I wrote this, xdebug and Zend debugger are not currently compatible with it, because of the extension API changes.  If you need something better than a simple NOTICE or ERROR, then this will probably work for you.

The code is fairly clean, but could use some improvements, and if you have any suggestions or insights, please comment.

Some Notes

  • Some really nice debugging classes and so forth exist (like kint), but they're huge and do way more than I need them to, and also most won't easily mimic xdebug, if at all.
  • It does not work as well or provide as much detail as xdebug's stack tracing, but it's pretty close and may work for most people.
  • It doesn't provide remote debugging or anything like that, obviously
  • The code, ironically, does not take advantage of any improves made within PHP 7, so it can be used with older versions, but I've only tested it with 5.5 and 5.6

Instructions

I put the code at the top of my entry point (index.php for me) before any other code (autoloader, etc). Of course if you don't have an entry point, you can probably put it in some other global file if you want, such as a config file.

I also put it in the condition of "if (PHP_MAJOR_VERSION == 7) {…}", just in case I test my app with other versions as well.

PHP scalar type hinting takes a massive blow

One of the biggest criticisms of PHP (aside from syntax) is the lack of any sort of scalar typing, weak, strong, doesn't really matter, it simply doesn't exist. A push in the right direction was the call for "scalar type hinting," which was laid out in this PHP RFC:

https://wiki.php.net/rfc/scalar_type_hints

This topic, believe it or not, is a sensitive one, with some people being so against it that… well, I can't really think of an analogy, I don't know why the hell you'd be against it. Though some were against it just because they didn't like how this specific RFC defined how the PHP interpreter would know whether or not to do the actual type hinting.

Yes, they wanted to sink the idea because of a slightly related syntactical issue, instead of dealing with it later and implementing a very important thing.

The issue apparently caused so much grief that the major promoter, @AndreaFaulds has left PHP*:

http://news.php.net/php.internals/82750

This really sucks, and I find it to be truly disappointing. I think if we want to have PHP be taken more seriously by the broader programming world, we need to implement things that more "serious" languages have. I'm even more disappointed because I honestly thought that if this RFC did not pass, it may be years before anything close to type hinting on scalars is implemented in PHP, because it would create an untouchable issue like other things.

So, in the unlikelihood that other PHP developers are reading this, please keep pushing for scalar type hinting or something at least approaching that, and if you're a developer in PHP, keep asking for it, I know I will.

If no RFC is submitted for scalar typing in PHP 7, I'm probably going to switch languages, maybe Go or something, I don't know. I've been using PHP since 2002, and I've been waiting too damn long.

*Furthermore I think Andrea Faulds leaving PHP is sad because she promoted really good ideas and defined them very well in her RFCs. I think this is a language set back, but there are still a lot of great people on the PHP team, but I have to be honest and say I was really wanting to see all of her recent RFCs pass, they were all things I was also heavily interested in.

How Evan Doorbell Become a Phone Phreak

A semi-well known early phreaker known as Evan Doorbell, who is so well known in the normal world he doesn't even have a Wikipedia article, made a series of recordings (podcasts as the kids say) about how he became a phreaker.

These recordings don't just contain his personal history, but are full of great records of old busy signals, rings, phone company error recordings, and tons of insight into how a lot of the old stuff worked. They're extremely well edited as well, making them even more entertaining.

If you're interested in this kind of thing, I suggest you these out (mp3s taken from Phone Trips to save their bandwidth):

Episode 1 [ mp3 (00:25:24) @ 128kbps ]

Evan recounts how he first began getting curious with phones due to, what he felt, was an error message given in a slightly sexy manner. He reviews different error messages and error codes, and goes into how he speculated what different error codes meant. I think the most interesting aspect is how he talks about dialing special codes such as 660 (a "party line" out of Long Island, New York), figuring out how tones were different between the phones and phone company equipment, and some other insights.

Episode 2 [ mp3 (00:27:00) @ 128kbps ]

This begins in the summer of 1970 and how he's at summer camp and begins to broader his experimentation with the phone network. Including: how dialing 1 before an area code did not work where he lived, yet the phone company said to do this way; how the phone company changed their dial tone in 1965 (in his city); test circuits; and getting into figuring out how all of these things had to do with the type of switching equipment. And also, exploring phone prefixes, phone intercepts, and more.

Episode 3 [ mp3 (00:20:00) @ 128kbps ]

In this one Evan goes down to Atlanta, Georgia on a family trip and discovers more differences between Atlanta and Long Island.

Episode 4 [ mp3 (00:30:25) @ 128kbps ]

Interesting codes (prefixes, half numbers, etc) out of Long Island, NY in the late summer of 1970.

Episode 5 [ mp3 (00:32:42) @ 128kbps ]

Evan gets his own phone line as a teenager on a new prefix and tons of new things open up.

Episode 6 [ mp3 (00:34:50) @ 128kbps ]

An overview of the old billing "message" units, as used in the New York area in the 1970s. Even more recordings and information into party lines, phone switches, etc.

Phone Phreaking and Party Lines (Incomplete)

When talking about phreaking and party lines, sometimes there's slight confusion when it comes to younger people or non-phreaks, because there's the official term Party Line, which you can read about on Wikipedia, and the more colloquial term used among phone phreaks also just regular people screwing around on the phone in the 1970s.

Party Lines are basically like conference calls and phone companies had official ways of doing them, mostly through special dial ins and things like that. Everyone would dial a special phone number and all be connected together (one manner), and you can read more about that on Wikipedia.

At other times when people referred to party lines (especially in the phreaking community) what they were referring to were things such as:

  • Dialing test circuits, which would play a constant 500Hz tone, then flash the phone (fast hang up on old telephones), which would stop the tone, but keep you on the line.  These old test circuits no longer exist, instead have been replaced with ring-backs.
  • On many switches years ago, error messages such as a phone being disconnected would be played from one place, so if people made the same "mistake" in dialing, they would be able to speak to each other.

Among others, if you have another way this was done, please detail it in a comment and I'll add it.

Geonames: The only terrible choice we have

Table of Contents

  1. Loading your data
  2. Associating your data
  3. The mysterious admin1Codes.txt

When it comes to getting an accurate, up to date database of cities, towns, and what have you, really the best we can hope for is Geonames. They manage to keep things updated fairly regularly, however there also seems to be a slight, just a slight, massive tidal wave of garbage all mixed in.

As they say on the BBC: I'm on a journey… to figure out how to get the cities, associated provinces, and postal codes from these data dumps, and associate them together. You'd think it'd be easy, you'd think the CSVs have all of the proper association information there, easy to manage, and of course that's what a sane person would think.

Well, I've got a surprise for you.

I'm not going to go super in-depth here all tutorial-style, but rather just outline some things I had to deal with and the solutions I came up with, in hopes they will help you, because they sure as hell were not easy to find.

Loading your data

Loading it is fairly straight forward for a lot of people, but some people it isn't quite clear. A few things need to be considered:

  1. Regular utf8 columns in MySQL will not allow you to import alternative names and some other stuff
  2. In some cases even utf8mb4 columns aren't "good enough" to import some alternative names because the alternative names themselves aren't "good enough" and are malformed Unicode.
  3. Most alternative names are either identical, misspellings, or bizarre stuff like airport codes, misspellings in other languages, what seems to be some sort of sub-typing (like one for Moscow was "wsa     MOW" , what the hell that means, I have no idea), and you'd think the city in the native language/script would come first, but it usually doesn't, if at all. Though I'm sure you'll enjoy all of the Gothic unicode script of names which are absolutely necessary since we live sometime in the Early Middle Ages. 
  4. My suggestion is to just ignore the alternative names completely, but if you insist on importing them, use varbinary or blob.
  5. There are tons and tons of duplicates, with our Moscow example alone there are currently 7 versions of the exact same city, in the same country, and yes the feature type is the same.

The easiest way to do this is create a table in MySQL identical to the schema they lay out for whatever dump you have, but maybe varbinary instead of varchar (as mentioned above). In some cases you may want to add a little on the end too, some columns go over the documented width.

From your MySQL client:

This general idea works with all of the dumps so long as the table is exactly the same.

Whoops, hold on, for some reason there are needless backslashes escaping the tabs from time to time, so you have to go in and replace those otherwise MySQL will freak out. Are these backslashes a part of the names of some cities or places, perhaps a part of a strange alphabet?

LOL, of course they're not, they're just randomly there.

Associating your data

With the locations, you'd think with name like geonameid that would mean other dumps like postal would match up with it in some way, or they'd easily reference administrative divisions by ID… come on, you know that wouldn't do something that logical.

Geonames has the most horrendously, implausibly baffling, most terribly undocumented method of associating data I've seen in years; it's a wonder anyone uses it. It's made worse by the fact that those who have figured it out (which to me seems on par with deciphering Egyptian Hieroglyphs) essentially keep it a secret, as if posting about it on a forum will cause the Stasi to burst in their door and take their families away.

How the data are associated is pretty damn goofy and poorly thought out:

  • Countries are referenced by their ISO-3166-1 code, at least this is somewhat consistent, in fact GB is used rather than UK, so don't confuse with the almost identical country top-level domains. SX and XK are also in use for the new nations of South Sudan and Kosovo, since as of writing this, they do not have codes yet.
  • Cities and states/provinces: You'd think they'd follow the logical geonameid integer logic, but at this point I guess I don't need to say it's far more moronic than that. Cities are associated with provinces by the country and admin1 columns.  So what are these admin1 values? In some cases like US states they make sense, they're the ISO standard abbreviations, in some cases they're area codes, and in some cases they're various other things. (See bottom of this post about where to get the admin1 values from).
  • Postal codes are matched primarily by country and admin1 , admin2 (when they're properly filled, sometimes not), and the "place_name," however if you're expecting place_name to match the city in spelling in the other dump (such as O'Brien in the locations dump to be also O'Brien in the postal codes dump), you are sorely mistaken, yet again. Instead spelling changes are haphazard, so it can be "O Brien" or "OBrien" but never the expected "O'Brien," I mean jeeze, it's almost like they're running a psychological experiment on us.
  • As for other stuff, I don't have any information, my only concern was getting associated cities, provinces/states, countries, and postal codes, but it doesn't take a massive leap of faith to guess they're all messed up too.
The mysterious admin1Codes.txt

So you want to associate your cities with the appropriate province/state? "lol, kiss my ass" say the administrators of Geonames, they make it basically impossible.

How you say?

So where do they keep the states/provinces? No where, they deleted them. Apparently they felt it was "confusing for a lot of users" because of various issues, so instead of fixing those, they simply remove admin1Codes.txt from their web site, but continue to reference it in the official readme.txt for how to import the data.

Brilliant! That's exactly how not to confuse people, reference a non-existent file which contains important information on which cities belong to which provinces/states in countries, just delete it and replace it with nothing, because having anything at all would be confusing.

I don't recommend using the one linked by "marc," it's missing a lot and it's badly encoded.

I went ahead and recreated a new admin1Codes.txt, I based it on combining several versions of admin1Codes.txt I found, being certain to actually use the same encoding type through the whole file and making sure all of the cities listed have a code.

admin1Codes.txt with names in "plain-text" Unicode
admin1Codes.txt with names in hex (for easy transport, but they're still Unicode bytes)

If you import this, be certain to use utf8mb4_unicode_ci columns, and I explain at great length in my post: Better Unicode support for MySQL (including emoji).

Disclaimer: I can't vouch for the accuracy of the names or whether or not obsolete ones still exist, but hell, it's better than providing you with nothing.

In case you are interested in contributing to this list, an example of the missing codes can be found here: missing.txt, if you manage to match them up, let me know. A few I started to manually do, but since I spent so much time on this already, I stopped. I'm also considering starting an API service like Geonames that uses proper association. I realize the job of getting accurate data isn't easy, but if they've got people manually entering things to properly set it (and they do) then why in the hell don't they maintain a logical, static association?