How to split a string by repeated characters in PHP?
653
I'm trying to split a string with binary into an array of repeated characters.
For example, an array of10001101
split with this function would be:
$arr[0] = '1';
$arr[1] = '000';
$arr[2] = '11';
$arr[3] = '0';
$arr[4] = '1';
(I tried to make myself clear, but if you still don't understand, my question is the same as this one but for PHP, not Python)
Answer
Solution:
You can use
like so:
Example:
Output:
The regex:
(.)
- match a single character and capture it(?!\1|$)
- look at the next position and match if it's not the same as the one we just found nor the end of the string.\K
- keeps the text matched so far out of the overall regex match, making this match zero-width.Note: this does not work in PHP versions prior to 5.6.13 as there was a bug involving bump-along behavior with \K.
An alternative regex that works in earlier versions as well is:
This uses a lookbehind rather that
\K
in order to make the match zero-width.Answer
Solution:
Matches repeated character sequences of 1 or more. The regex stores the subject character into the second capture group (
(.)
, stored as$m[1]
), while the first capture group contains the entire repeat sequence (((.)\2*)
, stored as$m[0]
). With preg_match_all, it does this globally over the entire string. This can be applied for any string, e.g.'aabbccddee'
. If you want to limit to just0
and1
, then use[01]
instead of.
in the second capture group.Keep in mind $m may be empty, to first check if the result exists, i.e.
isset($m[0])
, before you use it.Answer
Solution:
I'm thinking something like this. The code id not tested, I wrote it directly in the comment, so it might have some errors, you can adjust it.
Answer
Solution:
I wouldn't bother looking for the end-of-string in the pattern.
Most succinctly, capture the first occurring character then allow zero or more repetitions of the captured character, then restart the fullstring match with
\K
so that no characters are lost in the explosions.Code: (Demo)
Output:
If you don't care for regular expressions, here is a way of iterating through each character, comparing it to the previous one and conditionally concatenating repeated characters to a reference variable.
Code: (Demo) ...same result as first snippet