What?
 A quick article to stop me running into this issue again. This article serves to address the issue of importing characters from an XML in a different language character set and trying to load it in PHP with the function simplexml_load_string(). The error I get is something similar to:
PHP Warning:
simplexml_load_string(): Entity: line #: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x3C 0x2F 0x73 in /home/public_html/my_folder/my_xml_processing_script.php on line 160
 Why?
 I'm downloading an XML feed to our servers, and then loading the downloaded file into memory with simplexml_load_string(). I get the above error when it is attempting to load an XML feed which is mostly in Spanish and breaks at the following XML node:
<baños>2</baños> -> yields issue: PHP Warning: simplexml_load_string(): <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160 should read <baños>2</baños>
- <baños>2</baños>
 - -> yields issue: PHP Warning: simplexml_load_string(): <baños>2</baños> in /home/public_html/my_folder/my_xml_processing_script.php on line 160
 - should read
 - <baños>2</baños>
 
 How?
A two-step process, my issue was with how the file was downloaded with cURL. The XML node should be baños.
 
 The initial command using cURL was:
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors
	- function get_data($url) {
 - $ch = curl_init();
 - $timeout = 5;
 - curl_setopt($ch, CURLOPT_URL, $url);
 - curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 - curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
 - $data = curl_exec($ch);
 - curl_close($ch);
 - return $data;
 - }
 - $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
 - $file_xml = simplexml_load_string( $file_content );  // doesn't work and returns a load of parser errors
 
 The tweaked command using cURL is:
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here
        curl_close($ch);
        return $data;
}
$file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
$file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works!  DONE! Stop reading any further and tell your boss it was always in hand.
	- function get_data($url) {
 - $ch = curl_init();
 - $timeout = 5;
 - curl_setopt($ch, CURLOPT_URL, $url);
 - curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 - curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
 - $data = utf8_decode(curl_exec($ch));  // note the utf8_decode function applied here
 - curl_close($ch);
 - return $data;
 - }
 - $file_content = get_data( "http://joellipman.com/xml_feeds/my_XML_url.xml" );
 - $file_xml = simplexml_load_string( utf8_encode( $file_content ) );  // works! DONE! Stop reading any further and tell your boss it was always in hand.
 
 
 Other things I tried but to no avail
 The solution above was as easy as that. Here are a number of other things I tried first:
- mysql_set_charset(): No
 - iconv(): No
 - htmlentities(): No
 - preg_replace_callback(): No
 - sxe(): No
 - $xml = simplexml_load_string( utf8_encode($rss) );: No. Oh wait, yes! sorta, don't forget the decode when downloading the XML.
 


						  
                
						  
                
						  
                
						  
                
						  
                

Add comment