Why this regex is not greedy in PHP

153

This regex should match lists just like in Markdown:

/((?:(?:(?:^[\+\*\-] )(?:[^\r\n]+))(?:\r|\n?))+)/m

It works in Javascript (withg flag added) but I have problems porting it to PHP. It does not behave greedy. Here's my example code:

$string = preg_replace_callback('`((?:(?:(?:^\* )(?:[^\r\n]+))(?:\r|\n?))+)`m', array(&$this, 'bullet_list'), $string);

function bullet_list($matches) { var_dump($matches) }

When I feed to it a list of three lines it displays this:

array(2) { [0]=> string(6) "* one " [1]=> string(6) "* one " } array(2) { [0]=> string(6) "* two " [1]=> string(6) "* two " } array(2) { [0]=> string(8) "* three " [1]=> string(8) "* three " } 

Apparentlyvar_dump is being called three times instead of just once as I expect from it since the regex is greedy and must match as many lines as possible. I have tested it on regex101.com. How do I make it work properly?

118

Answer

Solution:

Your regex can be reduced to:

(?:^[+*-] [^\r\n]+\R*)+

There're no needs to do all these groups.
\R means any kind of line break\n or\r or\r\n

Edit:\R looses its special meaning in a character class.[\R] meansR
Thanks to HamZa

540

Answer

Solution:

This regex won't work correctly if you have\r\n newlines in your input text.

The part(?:\r|\n?) matches either an\r or an\n, but not both. (regex101 treats newlines as\n only, so it works there).

Does the following work?

/(?:(?:(?:^[+*-] )(?:[^\r\n]+))[\r\n]*)+/m

(or, after removal of all the unnecessary non-capturing groups - thanks @M42!)

/(?:^[+*-] [^\r\n]+[\r\n]*)+/m
829

Answer

Solution:

This will match all bulleted lines until it gets to the first line that is not bulleted.

(?<=^|\R)\*[\s\S]+?(?=$|\R[^*])
  • \* match a bullet where:
    • (?<=^|\R) it is preceeded by the start of the string or a newline.
  • [\s|S]+? match any character non-greedily where
    • (?=$|\R[^*]) the matched sequence is followed by the end of string or a new line character followed by a *. Essentially this means that the sequence match is complete when a non-bullet line is found or when end of string.

Results:

The resulting matches are shown in the RegexBuddy output below (Regex 101 can't handle it):

regex result

People are also looking for solutions to the problem: javascript - How can I link a JS script to my header file?

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.