PHP - Encoding issue when saving to XML file using SimpleXml

962

I am struggling with encoding issues in a PHP app that:

  1. Reads an XML file and parses it according to some rules
  2. Calls the Google Translate API and uses the result to populate a database that is later used to display data on the browser (that part works well)
  3. Saves that data to an XML file (it saves but there's something wrong with the encoding).

The data comes from Google Translate encoded in UTF-8 and in the browser, provided that you have the proper heading it displays fine whatever the language is.

Here's the Google Translate function:

    function mt($text, $lang) {

    $url = 'https://www.googleapis.com/language/translate/v2?key=' .  $apiKey . '&q=' . rawurlencode($text) . '&source=en&target=' . $lang;
    $handle = curl_init($url);
    curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
    $response = curl_exec($handle);
    $responseDecoded = json_decode($response, JSON_UNESCAPED_UNICODE);
    $responseCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);      

    curl_close($handle);

    if($responseCode != 200) {
        $resultxt = 'not200result';
      }
    else {
        $resultxt = $responseDecoded['data']['translations'][0]['translatedText'];
    }
    return $resultxt;
}

I'm using Simplexml to load an XML file, modify its contents and save it with asXml(). The generated XML file is encoded in something other than UTF-8 as it looks like this:

<value>&#x3088;&#x3046;&#x3053;&#x305D;&#xFF05;0 ST&#x6570;&#x5B66;</value>

Here's the code that attributes the translation to the XML node and saves it.

$xml=simplexml_load_file('myfile.xml'); //Load source XML file
$xml->addAttribute('encoding', 'UTF-8');
$xmlFile = 'translation.xml'; //File that will be saved 
//Here I have a call to the MT function above and get it to the XML file at face value. 
$xml->asXML($xmlFile) //save translated XML file

I've tried using htmentities() and played with utf8_encode() and utf8_decode() but can't make it work.

I've tried everything and looked at many other posts. For the life of me, I can't figure this one out. Any help is appreciated.

People are also looking for solutions to the problem: php - Adding first and last name to Wordpress registration form

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.