regex - word boundary on non latin characters in php

662

This example works fine:

echo preg_replace("/\bI\b/u", 'we', "I can"); // we can

This one were russian letters are used does not work even though I use "u" modifier:

echo preg_replace("/\bЯ\b/u", 'мы', 'Я могу'); // still "Я могу"

So the question is what should I do to fix this?
Thanks.

488

Answer

Solution:

In PCRE (the library used bypreg_replace),\b refers only to word boundaries in an ASCII sense, i.e., only[a-zA-Z0-9_] are word characters.

If you want to match aЯ character that has no letters, digits or_ immediately before or after, you can use:

(?<![\p{L}0-9_])Я(?![\p{L}0-9_])

You still have to use theu modifier.

677

Answer

Solution:

Word boundaries are often counter-intuitive.

People are also looking for solutions to the problem: php - How do I create an empty table with SQL?

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.