Thursday, April 05, 2007

Faking Conditionals in Regular Expressions

Update: This post contains several major errors. Please view the updated version on my new blog:

Mimic Regular Expression Conditionals.

Excited by the fact that I can mimic atomic groups when using most regex libraries which don't support them, I set my sights on another of my most wanted regex features which is commonly lacking: conditionals (which provide an if-then-else construct). Of the libraries I'm familiar with, conditionals are only supported by .NET, PCRE, PHP (when using PCRE via the preg functions), and JGSoft products (including RegexBuddy).

There are two kinds of regex conditionals in those libraries... lookaround-based and capturing-group-based. The functionality of lookaround-based conditionals is very easy to replicate. First, here's what such conditionals look like (this example uses a positive lookahead for the assertion):


To mimic that behavior in languages which don't support conditionals, just add a colon after the initial question mark to turn it into a non-capturing group, like so:


As long as the regex engine you're using supports the specified lookaround type, those patterns do the same thing.

However, mimicking capturing-group-based conditionals proved to be more tricky. Conditionals which use an optional capturing group as their test allow you to base logic on whether a capturing group has participated in the match so far. Thus...


...matches only "bd" and "abc". That pattern can be expressed as follows:


Here's a comparable pattern I created which doesn't require support for conditionals:


To use it without an "else" part, you still need to include "\3" at the end, like this:


As a brief explanation of how that works, there's an empty alternation option within the lookahead at the beginning which is used to cancel the effect of the lookahead, while at the same time, the intentionally empty capturing groups within the alternation are exploited to base the then/else part on which option in the lookahead matched. However, there are a couple issues:

  • This doesn't work with some regex engines, due to how they handle backreferences for non-participating capturing groups.
  • It interacts with backtracking differently than a real conditonal (the "a" part is treated as if it were within an optional, atomic group... e.g., (?>(a)?) instead of (a)?), so it's best to think of this as a new operator which is similar to a conditional.

Here are the regex engines I've briefly tested this pattern with:

Language Supports "fake conditionals" Supports real conditionals Notes
.NET Yes Yes Tested using Expresso.
ColdFusion Yes No Tested using ColdFusion 7.
Java Yes No Tested using Regular Expression Test Applet.
JavaScript No No JavaScript assigns an empty string, rather than null, to backreferences for non-participating capturing groups. Unfortunately, this pattern depends on the way most other libraries handle non-participating capturing groups.
JGSoft Yes
Yes As of RegexBuddy version 2.3.2, it performs correctly in more cases if you change the two empty capturing groups ("()") to match a zero-length value of no impact, such as "(.{0})" or "(\b|\B)". It also has problems matching values at the end of a string when using an empty else. I've reported both issues to JGSoft, and have been told they will be fixed in the next version of RegexBuddy.
Yes Tested using PHP Regex Tester. Performs correctly in more cases if you explicitly state the condition twice, like so: "(?=(?:a)()|())(?:a)?b(?:\1c| \2d)".

If you discover ways to improve this, or find problems not already mentioned, please let me know.


Gary Fenton said...

I had no idea that was possible! As no one had left a comment and because it was such a great post I thought I should say something. If you ever get a chance to expand this with more examples that would be cool. Thanks.

Steve said...

Thanks, Gary!

I'll try to add better examples for this eventually, but for now note that there are several errors in this post. You can see the updated, corrected version on my new blog at Mimic Regex Conditionals

Anonymous said...


Anonymous said...

You wellness
Personal websites:
شات - دردشه - دردشة - شات صوتي-
دردشه صوتيه - دردشة صوتية -
شات صوتي
- منتديات -
- منتدى - العاب -

Anonymous said...



ball valve球阀
gate valve闸阀
angle valve角阀
Check valve
hot-water heating

Anonymous said...

Making gw gold is the old question : Honestly there is no fast way to make lots of GuildWars Gold . Sadly enough a lot of the people that all of a sudden come to with millions of Guild Wars Gold almost overnight probably duped . Although there are a lot of ways to make lots of GuildWars moneyhere I will tell you all of the ways that I know and what I do to make cheap gw gold.

As a new player , you may need some game guides or information to enhance yourself.
habbo credits is one of the hardest theme for every class at the beginning . You must have a good way to manage yourhabbo gold.If yor are a lucky guy ,you can earn so many habbo coins by yourself . But if you are a not , I just find a nice way to get buy habbo gold. If you need , you can buycheap habbo credits at our website . Go to the related page and check the detailed information . Once you have any question , you can connect our customer service at any time .

Anonymous said...