In php, I need to replace all non-UTF8 characters in a string. However, not by some equivalent (like theiconv function with//TRANSLIT) but by some chosen character (like"_" or"*" for example).

Typically I want the user to be able to see the position were the invalid characters were found.

I didn't find any functions that do this, so I was going to use:

  • useiconv with//IGNORE
  • do a diff on the two strings and insert the wanted character where the non-UTF8 ones where

Do you see a better way to do that, is there some functions in php that can be combined to have this behavior ?

Here are 2 functions to help you achieve something close to what you want :

//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
 '?', $some_string );

//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
 '|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );

note that you can change the replacement (which currently is '?' with anything else by changing the string located atpreg_replace('blablabla', **'?'**, $some_string)

the original article :

