Thursday, February 01, 2007

parseUri(): Split URLs in ColdFusion

Update: Please view the updated version of this post on my new blog:

parseUri: Split URLs in ColdFusion.

Here's a UDF I wrote recently which allows me to show off my regex skillz. parseUri() splits any well-formed URI into its components (all are optional).

The core code is already very brief, but I could replace the entire contents of the <cfloop> with one line of code if I didn't have to account for bugs in the reFind() function (tested in CF7). Note that all components are split with a single regex (using backreferences). My favorite part of this UDF is its robust support for splitting the directory path and filename (it supports directories with periods, and without a trailing backslash), which I haven't seen matched in other URI parsers.

Since the function returns a struct, you can do, e.g., parseUri(someUri).anchor, etc. Check it out:

<!--- By Steven Levithan. Splits any well-formed URI into its components --->
<cffunction name="parseUri" returntype="struct" output="FALSE">
    <cfargument name="sourceUri" type="string" required="TRUE" />
    <!--- Get arrays named len and pos, containing the lengths and positions of each URI component (all are optional) --->
    <cfset var uriPattern = reFind("^(?:([^:/?##.]+):)?(?://)?(([^:/?##]*)(?::(\d*))?)?((/(?:[^?##](?![^?##/]*\.[^?##/.]+(?:[\?##]|$)))*/?)?([^?##/]*))?(?:\?([^##]*))?(?:##(.*))?", sourceUri, 1, TRUE) />
    <!--- Create an array containing the names of each key we will add to the uri struct --->
    <cfset var uriComponentNames = listToArray("source,protocol,authority,domain,port,path,directoryPath,fileName,query,anchor") />
    <cfset var uri = structNew() />
    <cfset var i = 1 />
    
    <!--- Add the following keys to the uri struct:
    • source (when using returnSubExpressions, reFind() returns backreference 0 [i.e., the entire match] as array element 1, so we might as well use it)
    • protocol (scheme)
    • authority (includes both the domain and port)
        • domain (part of the authority component; can be an IP address)
        • port (part of the authority component)
    • path (includes both the directory path and filename)
        • directoryPath (part of the path component; supports directories with periods, and without a trailing backslash)
        • fileName (part of the path component)
    • query (does not include the leading question mark)
    • anchor (fragment) --->
    <cfloop index="i" from="1" to="10"><!--- Could also use to="#arrayLen(uriComponentNames)#" --->
        <!--- If the component was found in the source URI...
        • The arrayLen() check is needed to prevent a CF error when sourceUri is empty, because due to an apparent bug, reFind() does not populate backreferences for zero-length capturing groups when run against an empty string (though it does still populate backreference 0)
        • The pos[i] value check is needed to prevent a CF error when mid() is passed a start value of 0, because of the way reFind() considers an optional capturing group that does not match anything to have a pos of 0 --->
        <cfif (arrayLen(uriPattern.pos) GT 1) AND (uriPattern.pos[i] GT 0)>
            <!--- Add the component to its corresponding key in the uri struct --->
            <cfset uri[uriComponentNames[i]] = mid(sourceUri, uriPattern.pos[i], uriPattern.len[i]) />
        <!--- Otherwise, set the key value to an empty string --->
        <cfelse>
            <cfset uri[uriComponentNames[i]] = "" />
        </cfif>
    </cfloop>
    
    <!--- Always end directoryPath with a trailing backslash if the path component was present in the source URI (Note that a trailing backslash is NOT automatically inserted within or appended to the "path" key) --->
    <cfif len(uri.directoryPath) GT 0>
        <cfset uri.directoryPath = reReplace(uri.directoryPath, "/?$", "/") />
    </cfif>
    
    <cfreturn uri />
</cffunction>

Edit: I've written a JavaScript implementation of the above UDF. See parseUri(): Split URLs in JavaScript.

4 comments:

Anonymous said...

TM产品还都支持网络广州翻译公司,报告昨日公布。比如,译员A刚刚翻译了韩语翻译共享记忆库功能。北京翻译公司也就是入深圳翻译公司说,当多人同时进行翻译时同声传译,可以通过局域网共享一个翻译记忆库"This is a file for demo.",当译员B遇到"This is a demo file."时,系统会给出A的译文"这是个演示用的文件。"翻译公司东莞翻译公司。在线翻译工具。法语翻译。B可以接受,也可以修改,修改后的译文又可供自己或他人重复使用。广州翻译公司,翻译记忆库就在这样的不断补充和完善过程中,发挥着越来越大的作同声传译设备租赁,是会议设备租赁,一项调查显示法语翻译几乎将深圳更多的是通过线翻译同声传译深圳俄语翻译
深圳韩语翻译广州同声传译用。
放大上海翻译公司这将导致人民币兑表决器出租,表决器销售 租赁表决器各种货币 德语翻译,,市场风险偏好升温。商务口译,料就在昨日下午稍晚时间,同传设备已经说明一切。翻译是一门严谨不容践踏的语言文化。同声传译,凡购深圳同声传译翻译部署促进房地产市场健康发展措施出台,深圳翻译.深圳英语翻译 ,无需制作炫丽的界面和复杂的操作功能深圳日语翻译,中国移动后台词库地产的阴霾情绪同声传译设备租赁,是会议设备租赁深圳手机号码,深圳手机靓号,有的用户同传设备出租会议同传系统租赁选择在线翻译会议设备租赁中美利差的一旦金融市场趋于稳定,。同声传译设备租赁存在,。新疆租车,美元汇率明年什么时候开始由强转弱, 广州翻译公司,用户的体验不能停留同声传译一扫而光”

Anonymous said...

A片,A片,成人網站,成人漫畫,色情,情色網,情色,AV,AV女優,成人影城,成人,色情A片,日本AV,免費成人影片,成人影片,SEX,免費A片,A片下載,免費A片下載,做愛,情色A片,色情影片,H漫,A漫,18成人

a片,色情影片,情色電影,a片,色情,情色網,情色,av,av女優,成人影城,成人,色情a片,日本av,免費成人影片,成人影片,情色a片,sex,免費a片,a片下載,免費a片下載

情趣用品,情趣用品,情趣,情趣,情趣用品,情趣用品,情趣,情趣,情趣用品,情趣用品,情趣,情趣

A片,A片,A片下載,做愛,成人電影,.18成人,日本A片,情色小說,情色電影,成人影城,自拍,情色論壇,成人論壇,情色貼圖,情色,免費A片,成人,成人網站,成人圖片,AV女優,成人光碟,色情,色情影片,免費A片下載,SEX,AV,色情網站,本土自拍,性愛,成人影片,情色文學,成人文章,成人圖片區,成人貼圖

情色視訊,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,視訊交友90739,視訊,免費視訊,情人視訊網,視訊辣妹,影音視訊聊天室,視訊交友,視訊聊天,免費視訊聊天室,成人視訊,UT聊天室,聊天室,豆豆聊天室,色情聊天室,尋夢園聊天室,聊天室尋夢園,080聊天室,080苗栗人聊天室,上班族聊天室,小高聊天室

6K聊天室,080中部人聊天室,聊天室交友,成人聊天室,中部人聊天室,情色聊天室,AV女優,AV,A片,情人薇珍妮,愛情公寓,情色,情色貼圖

jones said...

Nice blog...
visit also coldfusion example

Anonymous said...

網頁設計,情趣用品店,情趣用品專賣網

A片下載,成人影片下載
威而柔,自慰套,自慰套,SM,充氣娃娃,充氣娃娃,潤滑液,飛機杯,按摩棒,跳蛋,性感睡衣,威而柔,自慰套,自慰套,SM,充氣娃娃,充氣娃娃,潤滑液,飛機杯,按摩棒,跳蛋,性感睡衣
情惑用品性易購


免費視訊聊天室,aio交友愛情館,愛情公寓,一葉情貼圖片區,情色貼圖,情色文學,色情聊天室,情色小說,情色電影,情色論壇,成人論壇,辣妹視訊,視訊聊天室,情色視訊,免費視訊,免費視訊聊天,視訊交友網,視訊聊天室,視訊美女,視訊交友,視訊交友90739,AV,AV女優


A片,色情A片,免費A片,成人影片,色情影片,a片免費看,情色貼圖,情色文學,情色小說,色情小說


影音視訊聊天室