For those of you who follow our blog, Brett does a lot of articles about PowerShell and how great it is. So the other day when I need to do a mysql dump of some data on a Windows 2008 server, I fired it up. The deep blue background touched my inner California beach bum, and the verbose bright red error messages made my inner programmer smile.
“This is pretty neat,” I thought. “Maybe Windows has finally made something to compete with Unix’s shell in a real and meaningful way.”
The dump finished and immediately my Common Sense began tingling…
The resulting SQL file was 22 Gb, about twice what I expected it to be. At that size, there are very few text editors that will handle it, since it needs to read it in chunks whereas most like to at least go over the file once to calculate lines, sizes, syntax, etc. I have to admit, I don’t usually do this sort of thing on Windows and with a production server down at the time, and our Windows admins gone for the day, I didn’t have time to play around. I rared the file (dropping the size to around 1.3 Gb) and moved it to a more comfortable linux environment. I extracted the monstrous 22 Gb back out and ran it on the mysql server… only to get errors.
Errors on every single character.
Starting with the – of the initial “– this is a mysql dump file” line.
I stopped execution, since it wasnt executing anyway and opened it with a hex editor
-bash-3.2# hexedit dump.sql
00000000 2D 00 2D 00 20 00 20 00 4D 00 79 00 53 00 51 00 4C 00 20 00 64 00 75 00 6D 00 70 00 -.-. . .M.y.S.Q.L. .d.u.m.p.
0000001C 20 00 31 00 30 00 2E 00 31 00 33 00 20 00 20 00 44 00 69 00 73 00 74 00 72 00 69 00 .1.0...1.3. . .D.i.s.t.r.i.
00000038 62 00 20 00 35 00 2E 00 35 00 2E 00 35 00 2D 00 6D 00 33 00 2C 00 20 00 66 00 6F 00 b. .5...5...5.-.m.3.,. .f.o.
00000054 72 00 20 00 57 00 69 00 6E 00 36 00 34 00 20 00 28 00 78 00 38 00 36 00 29 00 0D 00 r. .W.i.n.6.4. .(.x.8.6.)...
00000070 0A 00 2D 00 2D 00 0D 00 0A 00 2D 00 2D 00 20 00 48 00 6F 00 73 00 74 00 3A 00 20 00 ..-.-.....-.-. .H.o.s.t.:. .
0000008C 6C 00 6F 00 63 00 61 00 6C 00 68 00 6F 00 73 00 74 00 20 00 20 00 20 00 20 00 44 00 l.o.c.a.l.h.o.s.t. . . . .D.
000000A8 61 00 74 00 61 00 62 00 61 00 73 00 65 00 3A 00 20 00 0D 00 0A 00 2D 00 2D 00 20 00 a.t.a.b.a.s.e.:. .....-.-. .
Well now I know why it was giving errors on every character: null bytes immediately following each and everyone. That could only mean one thing.
16 bit Unicode.
Alright, fair enough. Let’s look at the MySQL documentation and find out what to specify as far as charset to get it to read 16 bit Unicode. Turns out there are really only two specifications for Unicode in MySQL: UTF8, the default. and UCS2, which didn’t work. So I assume it is a Windows MySQL thing. I try running it on the Windows server it came from.
Same errors on every character.
Ok, so a SQL dump created by a MySQL server can’t be run again on that same server? That makes no sense. MySQL wouldn’t do something that blatantly stupid.
I hope.
Further investigations lead me to this site: Files created by Powershell are over 2X bigger than files created by text editor or cmd.exe. Seems I am not the only person seeing this. And Powershell is the culprit.
To be sure, I did a smaller dump in command prompt (cmd) and it worked fine.
I ran this on the linux server (since there shouldn’t be null bites in a mysql dump anyway and it was faster than doing a new dump):
perl -pi.bak -e 's/x00//g' < dump2.sql > dump3.sql
After the null bytes were stripped out, it ran fine.
PowerShell is good, and PowerShell is great, but don’t mix and match PowerShell generated files with linux unless you are sure it will handle 16 bit Unicode.
Leave a Reply
You must be logged in to post a comment.