Excited by the fact that I can mimic atomic groups when using most regex libraries which don't support them, I set my sights on another of my most wanted regex features which is commonly lacking: conditionals (which provide an if-then-else construct). Of the libraries I'm familiar with, conditionals are only supported by .NET, PCRE, PHP (when using PCRE via the preg functions), and JGSoft products (including RegexBuddy).
There are two kinds of regex conditionals in those libraries... lookaround-based and capturing-group-based. The functionality of lookaround-based conditionals is very easy to replicate. First, here's what such conditionals look like (this example uses a positive lookahead for the assertion):
To mimic that behavior in languages which don't support conditionals, just add a colon after the initial question mark to turn it into a non-capturing group, like so:
As long as the regex engine you're using supports the specified lookaround type, those patterns do the same thing.
However, mimicking capturing-group-based conditionals proved to be more tricky. Conditionals which use an optional capturing group as their test allow you to base logic on whether a capturing group has participated in the match so far. Thus...
...matches only "bd" and "abc". That pattern can be expressed as follows:
Here's a comparable pattern I created which doesn't require support for conditionals:
To use it without an "else" part, you still need to include "
\3" at the end, like this:
As a brief explanation of how that works, there's an empty alternation option within the lookahead at the beginning which is used to cancel the effect of the lookahead, while at the same time, the intentionally empty capturing groups within the alternation are exploited to base the then/else part on which option in the lookahead matched. However, there are a couple issues:
- This doesn't work with some regex engines, due to how they handle backreferences for non-participating capturing groups.
- It interacts with backtracking differently than a real conditonal (the "a" part is treated as if it were within an optional, atomic group... e.g.,
(a)?), so it's best to think of this as a new operator which is similar to a conditional.
Here are the regex engines I've briefly tested this pattern with:
|Language||Supports "fake conditionals"||Supports real conditionals||Notes|
|.NET||Yes||Yes||Tested using Expresso.|
|ColdFusion||Yes||No||Tested using ColdFusion 7.|
|Java||Yes||No||Tested using Regular Expression Test Applet.|
|Yes||As of RegexBuddy version 2.3.2, it performs correctly in more cases if you change the two empty capturing groups ("
|Yes||Tested using PHP Regex Tester. Performs correctly in more cases if you explicitly state the condition twice, like so: "
If you discover ways to improve this, or find problems not already mentioned, please let me know.