In my previous post, one of the examples I used of when capturing groups are appropriate demonstrated how to match quoted strings:
To recap, that will match values enclosed in either double or single quotes, while requiring that the same quote type start and end the match. It also allows inner, escaped quotes of the same type as the enclosure.
On his blog, Ben Nadel asked:
I do not follow the
\\\1in the middle group. You said that that was an escaped closing of the same type (group 1). I do not follow. Does that mean that the middle group can have quotes in it? If that is the case, how does the reluctant search in the middle (
*?) know when to stop if it can have quotes in side of it? What am I missing?
Good question. Following is the response I gave, slightly updated to improve clarity:
If anyone has questions about how other, specific regex patterns work, or why they don't work, let me know, and I can try to make "Regexes in Depth" a regular feature here.
I'm not going to go into explaining that, but the more advanced regex features used are a negative lookbehind, conditional, and Unicode character properties.
Here are some examples of the kinds of quoted strings the above regex adds support for (in addition to preserving support for quotes enclosed with " or ', neither of which are designated as opening or closing quote characters in Unicode).
Now, it will no longer match ‘test”, and will successfully match things like ‘t‘e“"”s\’t’. Note that I'm using nested conditionals in the above regex to achieve an if-elseif-else construct. Also, now that it's no longer Unicode-based, it will work with regex engines which support both lookbehinds and conditionals (PCRE, PHP, the .NET framework, and possibly others).