Why this regex is not greedy in PHP
This regex should match lists just like in Markdown:
/((?:(?:(?:^[\+\*\-] )(?:[^\r\n]+))(?:\r|\n?))+)/m
It works in Javascript (withg
flag added) but I have problems porting it to PHP. It does not behave greedy. Here's my example code:
$string = preg_replace_callback('`((?:(?:(?:^\* )(?:[^\r\n]+))(?:\r|\n?))+)`m', array(&$this, 'bullet_list'), $string);
function bullet_list($matches) { var_dump($matches) }
When I feed to it a list of three lines it displays this:
array(2) { [0]=> string(6) "* one " [1]=> string(6) "* one " } array(2) { [0]=> string(6) "* two " [1]=> string(6) "* two " } array(2) { [0]=> string(8) "* three " [1]=> string(8) "* three " }
Apparentlyvar_dump
is being called three times instead of just once as I expect from it since the regex is greedy and must match as many lines as possible. I have tested it on regex101.com.
How do I make it work properly?
Answer
Solution:
Your regex can be reduced to:
There're no needs to do all these groups.
\R
means any kind of line break\n
or\r
or\r\n
Edit:
\R
looses its special meaning in a character class.[\R]
meansR
Thanks to HamZa
Answer
Solution:
This regex won't work correctly if you have
\r\n
newlines in your input text.The part
(?:\r|\n?)
matches either an\r
or an\n
, but not both. (regex101 treats newlines as\n
only, so it works there).Does the following work?
(or, after removal of all the unnecessary non-capturing groups - thanks @M42!)
Answer
Solution:
This will match all bulleted lines until it gets to the first line that is not bulleted.
\*
match a bullet where:(?<=^|\R)
it is preceeded by the start of the string or a newline.[\s|S]+?
match any character non-greedily where(?=$|\R[^*])
the matched sequence is followed by the end of string or a new line character followed by a *. Essentially this means that the sequence match is complete when a non-bullet line is found or when end of string.Results:
The resulting matches are shown in the RegexBuddy output below (Regex 101 can't handle it):