Thursday, February 01, 2007

reMatch(): Improving ColdFusion's regex support

Update: Please see this post on my new blog, which includes a demo of the REMatch function:

REMatch (ColdFusion).

Following are some UDFs I wrote recently to make using regexes in ColdFusion a bit easier. The biggest deal here is my reMatch() function.

reMatch(), in its most basic usage, is similar to JavaScript's String.match() method. Compare getting the first number in a string using reMatch() vs. built-in ColdFusion functions:

  • reMatch():
    <cfset num = reMatch("\d+", string) />
  • reReplace():
    <cfset num = reReplace(string, "\D*(\d+).*", "\1") />
  • reFind():
    <cfset matchInfo = reFind("\d+", string, 1, TRUE) />
    <cfset num = mid(string, matchInfo.pos[1], matchInfo.len[1]) />

All of the above would return the same result, unless a number wasn't found in the string, in which case the reFind()-based method would throw an error since the mid() function would be passed a start value of 0. I think it's pretty clear from the above which approach is easiest to use for a situation like this.

Still, that's just the beginning of what reMatch() can do. Change the scope argument from the default of "ONE" to "ALL" (to follow the convention used by reReplace(), etc.), and the function will return an array of all matches. Finally, set the returnLenPos argument to TRUE and the function will return either a struct or array of structs (based on the value of scope) containing the len, pos, AND value of each match. This is very different from how the returnSubExpressions argument of reFind() works. When using returnSubExpressions, you get back a struct containing arrays of the len and pos (but not value) of each backreference from the first match.

Here's the code, with four additional UDFs (reMatchNoCase(), match(), matchNoCase(), and escapeReChars()) added for good measure:

<!--- UDFs by Steven Levithan --->

<cffunction name="reMatch" output="FALSE">
    <cfargument name="regEx" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfargument name="caseSensitive" type="boolean" required="FALSE" default="TRUE" />
    <cfset var thisMatch = "" />
    <cfset var matchInfo = structNew() />
    <cfset var matches = arrayNew(1) />
    <!--- Set the time before entering the loop --->
    <cfset var timeout = now() />
    
    <!--- Build the matches array. Continue looping until additional instances of regEx are not found. If scope is "ONE", the loop will end after the first iteration --->
    <cfloop condition="TRUE">
        <!--- By using returnSubExpressions (the fourth reFind argument), the position and length of the first match is captured in arrays named len and pos --->
        <cfif caseSensitive>
            <cfset thisMatch = reFind(regEx, string, start, TRUE) />
        <cfelse>
            <cfset thisMatch = reFindNoCase(regEx, string, start, TRUE) />
        </cfif>
        
        <!--- If a match was not found, end the loop --->
        <cfif thisMatch.pos[1] EQ 0>
            <cfbreak />
        <!--- If a match was found, and extended info was requested, append a struct containing the value, length, and position of the match to the matches array --->
        <cfelseif returnLenPos>
            <cfset matchInfo.value = mid(string, thisMatch.pos[1], thisMatch.len[1]) />
            <cfset matchInfo.len = thisMatch.len[1] />
            <cfset matchInfo.pos = thisMatch.pos[1] />
            <cfset arrayAppend(matches, matchInfo) />
        <!--- Otherwise, just append the match value to the matches array --->
        <cfelse>
            <cfset arrayAppend(matches, mid(string, thisMatch.pos[1], thisMatch.len[1])) />
        </cfif>
        
        <!--- If only the first match was requested, end the loop --->
        <cfif scope IS "ONE">
            <cfbreak />
        <!--- If the match length was greater than zero --->
        <cfelseif thisMatch.pos[1] + thisMatch.len[1] GT start>
            <!--- Set the start position for the next iteration of the loop to the end position of the match --->
            <cfset start = thisMatch.pos[1] + thisMatch.len[1] />
        <!--- If the match was zero length --->
        <cfelse>
            <!--- Advance the start position for the next iteration of the loop by one, to avoid infinite iteration --->
            <cfset start = start + 1 />
        </cfif>
        
        <!--- If the loop has run for 20 seconds, throw an error, to mitigate against overlong processing. However, note that even one pass using a poorly-written regex which triggers catastrophic backtracking could take longer than 20 seconds --->
        <cfif dateDiff("s", timeout, now()) GTE 20>
            <cfthrow message="Processing too long. Optimize regular expression for better performance" />
        </cfif>
    </cfloop>
    
    <cfif scope IS "ONE">
        <cfparam name="matches[1]" default="" />
        <cfreturn matches[1] />
    <cfelse>
        <cfreturn matches />
    </cfif>
</cffunction>

<cffunction name="reMatchNoCase" output="FALSE">
    <cfargument name="regEx" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfreturn reMatch(regEx, string, start, scope, returnLenPos, FALSE) />
</cffunction>

<cffunction name="match" output="FALSE">
    <cfargument name="substring" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfreturn reMatch(escapeReChars(substring), string, start, scope, returnLenPos, TRUE) />
</cffunction>

<cffunction name="matchNoCase" output="FALSE">
    <cfargument name="substring" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfreturn reMatch(escapeReChars(substring), string, start, scope, returnLenPos, FALSE) />
</cffunction>

<!--- Escape special regular expression characters (.,*,+,?,^,$,{,},(,),|,[,],\) within a string by preceding them with a forward slash (\). This allows safely using literal strings within regular expressions --->
<cffunction name="escapeReChars" returntype="string" output="FALSE">
    <cfargument name="string" type="string" required="TRUE" />
    <cfreturn reReplace(string, "[.*+?^${}()|[\]\\]", "\\\0", "ALL") />
</cffunction>

Now that I've got a deeply featured match function, all I need Adobe to add to ColdFusion in the way to regex support is lookbehinds, atomic groups, possessive quantifiers, conditionals, balancing groups, etc., etc. :-)

14 comments:

Anonymous said...

Hey Steve,

After playing with your REMatch method (which has helped me more than once) I made a little change. I added 'SUB' to the scope argument, which will loop over each match and return sub-matches. You can read all about it here. I don't have code snippets in the blog yet, but there is a download available. If you have any suggestions please let me know.

Andrew Duckett said...

Didn't mean to be Anon on that last one!

Steve said...

Andrew,

Glad to hear this helped somebody. That is a potentially quite useful modification you made. ReMatch already had the potential to return tons of information about matches (i.e., the len, pos, and value of every match within a target string), but your modification ultimately results in an function capable of returning more info about matches via one function call than any regex-related function I personally know of in any programming language.

One note: When I wrote this, I wasn't aware that you could use underlying Java regex methods in ColdFusion. If I ever get around to releasing an updated version of ReMatch, it will use the Java methods, which offer faster speed and more powerful regular expression features (e.g., lookbehind). That would be my main suggestion for your CFC... use Java, if possible.

Thanks for posting!

Andrew said...

Hey Steve, me again. I gave the java.util.regex package a shot, and was able to get a basic version working. Check it out here

Matt said...

Hi Steve.

I'm trying to use CF and a RegEx to replace a part of a word/s in a long string - but only if it's not a url or an email address. Apparently this is very difficult as 'lookbehinds' are not supported.

there a use or combination of uses of your reMatch tag that could help?

The rest of my was rejected by the comment sys, so I have put it here:
www.tandabui.com/steve.txt

Matt

Anonymous said...

TM产品还都支持网络广州翻译公司,报告昨日公布。比如,译员A刚刚翻译了韩语翻译共享记忆库功能。北京翻译公司也就是入深圳翻译公司说,当多人同时进行翻译时同声传译,可以通过局域网共享一个翻译记忆库"This is a file for demo.",当译员B遇到"This is a demo file."时,系统会给出A的译文"这是个演示用的文件。"翻译公司东莞翻译公司。在线翻译工具。法语翻译。B可以接受,也可以修改,修改后的译文又可供自己或他人重复使用。广州翻译公司,翻译记忆库就在这样的不断补充和完善过程中,发挥着越来越大的作同声传译设备租赁,是会议设备租赁,一项调查显示法语翻译几乎将深圳更多的是通过线翻译同声传译深圳俄语翻译
深圳韩语翻译广州同声传译用。
放大上海翻译公司这将导致人民币兑表决器出租,表决器销售 租赁表决器各种货币 德语翻译,,市场风险偏好升温。商务口译,料就在昨日下午稍晚时间,同传设备已经说明一切。翻译是一门严谨不容践踏的语言文化。同声传译,凡购深圳同声传译翻译部署促进房地产市场健康发展措施出台,深圳翻译.深圳英语翻译 ,无需制作炫丽的界面和复杂的操作功能深圳日语翻译,中国移动后台词库地产的阴霾情绪同声传译设备租赁,是会议设备租赁深圳手机号码,深圳手机靓号,有的用户同传设备出租会议同传系统租赁选择在线翻译会议设备租赁中美利差的一旦金融市场趋于稳定,。同声传译设备租赁存在,。新疆租车,美元汇率明年什么时候开始由强转弱, 广州翻译公司,用户的体验不能停留同声传译一扫而光”

aiyipianni said...

Aston Villa rode their luck at Hull City where an 88-minute own goal from Kamil Zayatte saw them leapfrog three points clear of Arsenal and into fourth place in the Premier League wow gold with a 1-0 win.

Villa had to survive Hull penalty appeals for a handball against Ashley Young in time added on, television replays showing that referee Steve Bennett wow gold correctly rejected the claims after consulting a linesman.

Bennett had been involved in controversy after just five minutes when American goalkeeper Brad Friedel looked to have handed Hull the initiative and threaten Villa's return to the Champions League qualifying wow gold zone.

Friedel spilled wow gold the ball under pressure from Nick Barmby and stand-in right-back Nigel Reo-Coker turned it into his own net as he attempted to wow gold clear.

But Bennett cut short wow gold celebrations at the KC Stadium -- and let Friedel off the hook -- when he ruled out the score for an apparent infringement by Barmby.

Zayatte's intervention from a Young cross bound for wow gold Gabriel Agbonlahor then saw Villa leapfrog Arsenal and draw level with Manchester United on 38 points -- seven adrift of leaders Liverpool and four wow gold behind Chelsea.

Stung by an on-pitch dressing down wow gold by manager Phil Brown at Manchester City last week, Hull showed five changes and a vastly improved performance.

Promoted Hull were looking for only their second win in 11 games while wow gold Villa arrived unbeaten in seven and it looked to be heading for a goalless draw when the home side suffered a cruel late blow.

aiyipianni said...

South Africa inflicted the world of warcraft gold first home series defeat on Australia in almost 16 wow powerleveling years as they wrapped up a nine-wicket win over the world's number one ranked world of warcraft gold Test nation in Melbourne on Tuesday.

Captain Graeme Smith wow power leveling hit a fluent 75 as his side successfully passed a world of warcraft gold modest victory target of 183 on the final day at the MCG to take an wow powerleveling unassailable 2-0 lead.

It was the South dofus kamas African's first-ever Test series triumph in Australia and dofus kamas victory in the third and final match in Sydney will see them leapfrog the home side at the top of the global Lord of the Rings Online Gold rankings.

Hashim Amla LOTRO Gold (30 not out) scored the winning fly for fun penya runs shortly after lunch as South Africa flyff penya became the first team to overcome Australia at home Final Fantasy XI gilsince the West Indies in 1992-93.

South Africa ffxi gil were never under any pressure in eq2 plat their run chase and did not lose eq2 gold a wicket until just before lunch when the inspirational Smith Lord of the Rings Online Gold was trapped leg before wicket by Nathan LOTRO Gold Hauritz.

Smith had flyff penya dominated a 121-run opening stand buy flyff gold with Neil McKenzie, hitting ffxi gil 10 boundaries.

McKenzie struggled to buy ffxi gil a half century and survived strong eq2 plat lbw shouts from Brett Lee, eq2 gold who was bowling despite an injured foot that will Lord of the Rings Online gold keep him out of the Sydney Test.

South Africa's LOTRO gold victory was set up by a brilliant maiden Test century fly for fun penya from JP Duminy, who shared a stunning flyff penya 180-run ninth wicket partnership with pace bowler Dale Final Fantasy XI gil Steyn.

It gave the tourists ffxi gil a priceless 65-run lead on first innings before man of eq2 plat the match Steyn worked his magic with the ball as Australia were eq2 gold bowled out on the fourth day for 247 in their second innings.

The pugnacious Smith was virtually runescape money lost for words in his victory speech.

"It has been such a special moment runescape gold for all of us, it has been an incredible team effort," he said.

"I have been smiling non-stop wow po since we hit the winning runs.

"To be 2-0 up after this game was something wow or we only dreamt of."

South Africa won the first Test in Perth from an unlikely position, chasing 414 for victory for the loss of only four wickets.

Anonymous said...

A片,A片,成人網站,成人漫畫,色情,情色網,情色,AV,AV女優,成人影城,成人,色情A片,日本AV,免費成人影片,成人影片,SEX,免費A片,A片下載,免費A片下載,做愛,情色A片,色情影片,H漫,A漫,18成人

a片,色情影片,情色電影,a片,色情,情色網,情色,av,av女優,成人影城,成人,色情a片,日本av,免費成人影片,成人影片,情色a片,sex,免費a片,a片下載,免費a片下載

情趣用品,情趣用品,情趣,情趣,情趣用品,情趣用品,情趣,情趣,情趣用品,情趣用品,情趣,情趣

A片,A片,A片下載,做愛,成人電影,.18成人,日本A片,情色小說,情色電影,成人影城,自拍,情色論壇,成人論壇,情色貼圖,情色,免費A片,成人,成人網站,成人圖片,AV女優,成人光碟,色情,色情影片,免費A片下載,SEX,AV,色情網站,本土自拍,性愛,成人影片,情色文學,成人文章,成人圖片區,成人貼圖

情色視訊,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,視訊交友90739,視訊,免費視訊,情人視訊網,視訊辣妹,影音視訊聊天室,視訊交友,視訊聊天,免費視訊聊天室,成人視訊,UT聊天室,聊天室,豆豆聊天室,色情聊天室,尋夢園聊天室,聊天室尋夢園,080聊天室,080苗栗人聊天室,上班族聊天室,小高聊天室

6K聊天室,080中部人聊天室,聊天室交友,成人聊天室,中部人聊天室,情色聊天室,AV女優,AV,A片,情人薇珍妮,愛情公寓,情色,情色貼圖

jones said...

Nice blog...
visit also coldfusion example

元美女 said...

(法新社a倫敦二B十WE四日電) 「情色二零零七」情趣產品大產自二十三日起在色情影片倫敦的肯辛頓成人電影奧林匹亞展覽館成人影片舉行,倫敦人擺脫對性A片下載的保守態度踴躍參觀,許成人網站多穿皮衣與塑膠緊身衣的好色之徒擠進這項世界規模最大的成人生活展,估計三天展期可吸引八萬多好奇民眾參觀。

活動計畫負責色情人米里根承諾:「要搞浪漫A片、誘惑人、玩虐待,你渴望的我們都有。」

他說:「時髦的設計與華麗女裝,從吊色情飾到av女優束腹到真人大小的雕塑,是我們由今年展出的數千件產品精情色電影選出的一部分,參展產品還包括時尚服飾、貼情色電影身女用內在美、鞋子、珠寶、玩具、影片、藝術、圖書及遊戲,更不要說性愛輔具av及馬術裝備。」

參觀民眾遊覽兩百五十多個攤位AVAV女優有性感服裝、玩具及情色食品,迎合各種品味。
a片
大舞情色台上表演的是美國野蠻搖滾歌手瑪莉蓮曼森的前妻─全世界頭牌脫衣舞孃黛塔范提思成人影片,這是成人電影她今年a片下載在英國唯一一場表演。

以一九四零年代風格演出的a片黛塔范提思成人網站表演性感的天堂鳥、旋轉木馬及羽扇等舞蹈。

參展攤情色位有的推廣情趣用品,有的公開展示人體藝術和人體雕塑,也有情色藝術家工會成員提供建議。

Anonymous said...

Welcome to the 2moons dil, In here you can buy the 2moons gold, Do you know that the 2moon dil in the game is very important, If you had more cheap 2moons gold. I think you can get the tall level, quickly come here to buy 2moons dil.

Anonymous said...

Making gw gold is the old question : Honestly there is no fast way to make lots of GuildWars Gold . Sadly enough a lot of the people that all of a sudden come to with millions of Guild Wars Gold almost overnight probably duped . Although there are a lot of ways to make lots of GuildWars moneyhere I will tell you all of the ways that I know and what I do to make cheap gw gold.

As a new player , you may need some game guides or information to enhance yourself.
habbo credits is one of the hardest theme for every class at the beginning . You must have a good way to manage yourhabbo gold.If yor are a lucky guy ,you can earn so many habbo coins by yourself . But if you are a not , I just find a nice way to get buy habbo gold. If you need , you can buycheap habbo credits at our website . Go to the related page and check the detailed information . Once you have any question , you can connect our customer service at any time .

Anonymous said...

網頁設計,情趣用品店,情趣用品專賣網

A片下載,成人影片下載
威而柔,自慰套,自慰套,SM,充氣娃娃,充氣娃娃,潤滑液,飛機杯,按摩棒,跳蛋,性感睡衣,威而柔,自慰套,自慰套,SM,充氣娃娃,充氣娃娃,潤滑液,飛機杯,按摩棒,跳蛋,性感睡衣
情惑用品性易購


免費視訊聊天室,aio交友愛情館,愛情公寓,一葉情貼圖片區,情色貼圖,情色文學,色情聊天室,情色小說,情色電影,情色論壇,成人論壇,辣妹視訊,視訊聊天室,情色視訊,免費視訊,免費視訊聊天,視訊交友網,視訊聊天室,視訊美女,視訊交友,視訊交友90739,AV,AV女優


A片,色情A片,免費A片,成人影片,色情影片,a片免費看,情色貼圖,情色文學,情色小說,色情小說


影音視訊聊天室