Fix PHP cURL: parser error: Document labelled UTF-16 but has UTF-8 content
This is an article with notes for me on how to convert some received XML encoded in UTF-16 to some JSON in UTF-8. If it were entirely in UTF-8, I would simply load the received XML with SimpleXML and use the built-in PHP JSON_encode function. I ran into the following errors:
Warning: SimpleXMLElement::__construct() [<a href='simplexmlelement.--construct'>simplexmlelement.--construct</a>]: Entity: line 1: parser error : Document labelled UTF-16 but has UTF-8 content in /public_html/.../.../my_script.php on line ###
Warning: simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 1: parser error : Document labelled UTF-16 but has UTF-8 content in /public_html/.../.../my_script.php on line ###Why?
So I've googled, binged and yahoo'd for this and although there are some solutions that deal with loading UTF16 content into SimpleXMLElement or simplexml_load_string, it doesn't solve my problem. I'm receiving XML data within a cURL result but I get the above error with using either "SimpleXMLElement" or "simplexml_load_string". Returning the XML with cURL isn't a problem, but I want to convert it to JSON and I usually use a PHP function to load the data into an XML array and use the built-in PHP function: "json_encode".
A note for myself on some code to convert a string of two names into a string made up of the first name and then using the initial of the second name.
- -- What I have
- John Smith
- -- What I want
- John S.
- Fred B.
So, seriously, this is not going to stop all the attacks out there; it is simply a base function to build upon. This is a PHP function I got off the PHP forums years ago, and can't find it anymore, so I'm posting it here... And now people can see how I safeguard my sites but obscurity should never be a recommended security measure.
This function will check a posted string for suspicious activity and email the specified system administrator. You should modify it to suit your sites needs (eg. if you need to accept url encoded values). It needs some work I know but it's a start. I'd advise using PHP sessions though as well (see after).
For those of you who use Preg_Replace. Preg_replace is a function that uses regular expressions to search and replace a string.
Because my understanding with regular expressions is shady and varies from language to language, I've written this article as a quick reference point.
Just a quick note on how to format a given filesize and to reduce the display output to a small string, eg:
- 196 bytes : displays as => "196 bytes"
- 12945 bytes : displays as => "12 Kb"
- 1478515 bytes : displays as => "1 Mb"
- 8798745455 bytes : displays as => "8 Gb"