regex - word boundary on non latin characters in php
This example works fine:
echo preg_replace("/\bI\b/u", 'we', "I can"); // we can
This one were russian letters are used does not work even though I use "u" modifier:
echo preg_replace("/\bЯ\b/u", 'мы', 'Я могу'); // still "Я могу"
So the question is what should I do to fix this?
In PCRE (the library used by
\brefers only to word boundaries in an ASCII sense, i.e., only
[a-zA-Z0-9_]are word characters.
If you want to match a
Яcharacter that has no letters, digits or
_immediately before or after, you can use:
You still have to use the
Word boundaries are often counter-intuitive.