PHP Issue: simplexml_load_string parser error : Input is not proper UTF-8, indicate encoding !

What?
A quick article to stop me running into this issue again. This article serves to address the issue of importing characters from an XML in a different language character set and trying to load it in PHP with the function simplexml_load_string(). The error I get is something similar to:

PHP Warning:
simplexml_load_string(): Entity: line #: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x3C 0x2F 0x73 in /home/public_html/my_folder/my_xml_processing_script.php on line 160


Why?
I'm downloading an XML feed to our servers, and then loading the downloaded file into memory with simplexml_load_string(). I get the above error when it is attempting to load an XML feed which is mostly in Spanish and breaks at the following XML node:

<baños>2</baños>

-> yields issue: PHP Warning:  simplexml_load_string():     <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160

should read

<baños>2</baños>


How?

A two-step process, my issue was with how the file was downloaded with cURL. The XML node should be baños.

The initial command using cURL was:

function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors


The tweaked command using cURL is:

function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works!  DONE! Stop reading any further and tell your boss it was always in hand.



Other things I tried but to no avail
The solution above was as easy as that. Here are a number of other things I tried first:

  • mysql_set_charset(): No
  • iconv(): No
  • htmlentities(): No
  • preg_replace_callback(): No
  • sxe(): No
  • $xml = simplexml_load_string( utf8_encode($rss) );: No. Oh wait, yes! sorta, don't forget the decode when downloading the XML.

Accreditation

Badge - Certified Zoho Creator Associate
Badge - Certified Zoho Creator Associate

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin - Valid till 8 May 2022 3QnhmaBX7LQSRsC9hh6Je9rGQKEGNQNfPb
© 2021 Joel Lipman .com. All Rights Reserved.