Showing posts with label ColdFusion. Show all posts
Showing posts with label ColdFusion. Show all posts

Saturday, February 03, 2007

More URI-related UDFs

To follow up my parseUri() function, here are several more UDFs I've written recently to help with URI management:

  • getPageUri()
    Returns a struct containing the relative and absolute URIs of the current page. The difference between getPageUri().relative and CGI.SCRIPT_NAME is that the former will include the query string, if present.
  • matchUri(testUri, [masterUri])
    Returns a Boolean indicating whether or not two URIs are the same, disregarding the following differences:
    • Fragments (page anchors), e.g., "#top".
    • Inclusion of "index.cfm" in paths, e.g., "/dir/" vs. "/dir/index.cfm" (supports trailing query strings).
    If masterUri is not provided, the current page is used for comparison (supports both relative and absolute URIs).
  • replaceUriQueryKey(uri, key, substring)
    Replaces a URI query key and its value with a supplied key=value pair. Works with relative and absolute URIs, as well as standalone query strings (with or without a leading "?"). This is also used to support the following two UDFs:
  • addUriQueryKey(uri, key, value)
    Removes any existing instances of the supplied key, then appends it together with the provided value to the provided URI.
  • removeUriQueryKey(uri, key)
    Removes one or more query keys (comma delimited) and their values from the provided URI.

Now that I have these at my disposal, I frequently find myself using them in combination with each other, e.g.,
<a href="<cfoutput>#addUriQueryKey(
    getPageUri().relative,
    "key",
    "value"
)#</cfoutput>">Link</a>
.

Let me know if you find any of these useful…

<!--- Returns the relative and absolute URIs of the current page --->
<cffunction name="getPageUri" returntype="struct" output="FALSE">
    <cfset var pageProtocol = "http" />
    <cfset var pageQuery = "" />
    <cfset var uri = structNew() />
    
    <!--- Get the protocol of the current page --->
    <cfif CGI.HTTPS IS "ON">
        <cfset pageProtocol = "https" />
    </cfif>
    
    <!--- Get the query of the current page, including the leading question if the query is not empty --->
    <cfset pageQuery = reReplace("?" & CGI.QUERY_STRING, "\?$", "") />
    
    <!--- Construct the relative URI of the current page (excludes the protocol and domain) --->
    <cfset uri.relative = CGI.SCRIPT_NAME & pageQuery />
    <!--- Construct the absolute URI of the current page --->
    <cfset uri.absolute = pageProtocol & "://" & CGI.SERVER_NAME & uri.relative />
    
    <cfreturn uri />
</cffunction>

<!--- Returns a Boolean indicating whether or not two URIs are the same, disregarding the following differences:
• Fragments (page anchors), e.g., "#top".
• Inclusion of "index.cfm" in paths, e.g., "/dir/" vs. "/dir/index.cfm" (supports trailing query strings).
If masterUri is not provided, the current page is used for comparison (supports both relative and absolute URIs) --->
<cffunction name="matchUri" returntype="boolean" output="FALSE">
    <cfargument name="testUri" type="string" required="TRUE" />
    <cfargument name="masterUri" type="string" required="FALSE" default="" />
    
    <!--- If a masterUri was not provided --->
    <cfif len(masterUri) EQ 0>
        <!--- If testUri is an absolute URI --->
        <cfif reFindNoCase("^https?://", testUri) EQ 1>
            <cfset masterUri = getPageUri().absolute />
        <cfelse>
            <cfset masterUri = getPageUri().relative />
        </cfif>
    </cfif>
    
    <cfreturn reReplaceNoCase(reReplace(testUri, "##.*", ""), "/index\.cfm(?=\?|$)", "/", "ONE") IS reReplaceNoCase(reReplace(masterUri, "##.*", ""), "/index\.cfm(?=\?|$)", "/", "ONE") />
</cffunction>

<!--- Replace a URI query key and its value with a supplied key=value pair.
Works with relative and absolute URIs, as well as standalone query strings (with or without a leading "?") --->
<cffunction name="replaceUriQueryKey" returntype="string" output="FALSE">
    <cfargument name="uri" type="string" required="TRUE" />
    <cfargument name="key" type="string" required="TRUE" />
    <cfargument name="substring" type="string" required="TRUE" />
    <cfset var preQueryComponents = "" />
    <cfset var currentKey = "" />
    
    <!--- Remove any existing fragment (page anchor) from uri, since it will mess with our processing, and is unlikely to be relevant and/or correct in the new URI --->
    <cfset uri = reReplace(uri, "##.*", "", "ONE") />
    <!--- Store any pre-query URI components. For this to work, the string must start with "protocol:", "//authority", or "/" (path). Otherwise, we will assume the uri is comprised entirely of a query component --->
    <cfset preQueryComponents = reReplace(uri, "^((?:(?:[^:/?.]+:)?//[^/?]+)?(?:/[^?]*)?)?.*", "\1", "ONE") />
    <!--- Remove any pre-query components and the leading question mark from uri --->
    <cfset uri = reReplace(uri, "^(?:(?:[^:/?.]+:)?//[^/?]+)?(?:/[^?]*)?\??(.*)", "\1", "ONE") />
    <!--- Remove any superfluous ampersands in the query (this cleans up the query but is not required, and in any case this function doesn't generate superfluous ampersands) --->
    <cfset uri = reReplace(uri, "&(?=&)|&$", "", "ALL") />
    
    <!--- For each key specified, remove the corresponding key=value pair from uri. Note that key names which contain regex special characters (.,*,+,?,^,$,{,},(,),|,[,],\) which are not percent-encoded may behave unpredictably --->
    <cfloop index="currentKey" list="#key#" delimiters=",">
        <cfif len(currentKey) GT 0>
            <cfset uri = reReplaceNoCase(uri, ("(?:^|&)" & currentKey & "(?:=[^&]*)?"), "", "ALL") />
        </cfif>
    </cfloop>
    
    <!--- If we still have a value in uri after the above processing (beyond what we're about to add) --->
    <cfif len(uri) GT 0>
        <!--- Ensure the query is returned with only the necessary separator characters (? and &) --->
        <cfreturn (preQueryComponents & "?" & reReplace(uri, "^&", "") & reReplace("&" & substring, "&$", "")) />
    <cfelse>
        <!--- Append substring, including a leading question mark if substring is not empty --->
        <cfreturn (preQueryComponents & reReplace("?" & substring, "\?$", "")) />
    </cfif>
</cffunction>

<cffunction name="addUriQueryKey" returntype="string" output="FALSE">
    <cfargument name="uri" type="string" required="TRUE" />
    <cfargument name="key" type="string" required="TRUE" />
    <cfargument name="value" type="string" required="TRUE" />
    
    <!--- Until proper support is included for adding multiple keys with one call, use only the first key --->
    <cfset key = listFirst(key, ",") />
    
    <!--- Remove any existing instances of the key from uri, then add the new key=value pair.
    Do not include the trailing equals sign (=) if we're assigning an empty value to the added key --->
    <cfreturn replaceUriQueryKey(removeUriQueryKey(uri, key), "", (key & reReplace("=" & value, "=$", ""))) />
</cffunction>

<cffunction name="removeUriQueryKey" returntype="string" output="FALSE">
    <cfargument name="uri" type="string" required="TRUE" />
    <!--- Use a comma-delimited list to remove multiple keys with one call --->
    <cfargument name="key" type="string" required="TRUE" />
    
    <cfreturn replaceUriQueryKey(uri, key, "") />
</cffunction>

In other news, this cracked me up.

Thursday, February 01, 2007

parseUri(): Split URLs in ColdFusion

Update: Please view the updated version of this post on my new blog:

parseUri: Split URLs in ColdFusion.

Here's a UDF I wrote recently which allows me to show off my regex skillz. parseUri() splits any well-formed URI into its components (all are optional).

The core code is already very brief, but I could replace the entire contents of the <cfloop> with one line of code if I didn't have to account for bugs in the reFind() function (tested in CF7). Note that all components are split with a single regex (using backreferences). My favorite part of this UDF is its robust support for splitting the directory path and filename (it supports directories with periods, and without a trailing backslash), which I haven't seen matched in other URI parsers.

Since the function returns a struct, you can do, e.g., parseUri(someUri).anchor, etc. Check it out:

<!--- By Steven Levithan. Splits any well-formed URI into its components --->
<cffunction name="parseUri" returntype="struct" output="FALSE">
    <cfargument name="sourceUri" type="string" required="TRUE" />
    <!--- Get arrays named len and pos, containing the lengths and positions of each URI component (all are optional) --->
    <cfset var uriPattern = reFind("^(?:([^:/?##.]+):)?(?://)?(([^:/?##]*)(?::(\d*))?)?((/(?:[^?##](?![^?##/]*\.[^?##/.]+(?:[\?##]|$)))*/?)?([^?##/]*))?(?:\?([^##]*))?(?:##(.*))?", sourceUri, 1, TRUE) />
    <!--- Create an array containing the names of each key we will add to the uri struct --->
    <cfset var uriComponentNames = listToArray("source,protocol,authority,domain,port,path,directoryPath,fileName,query,anchor") />
    <cfset var uri = structNew() />
    <cfset var i = 1 />
    
    <!--- Add the following keys to the uri struct:
    • source (when using returnSubExpressions, reFind() returns backreference 0 [i.e., the entire match] as array element 1, so we might as well use it)
    • protocol (scheme)
    • authority (includes both the domain and port)
        • domain (part of the authority component; can be an IP address)
        • port (part of the authority component)
    • path (includes both the directory path and filename)
        • directoryPath (part of the path component; supports directories with periods, and without a trailing backslash)
        • fileName (part of the path component)
    • query (does not include the leading question mark)
    • anchor (fragment) --->
    <cfloop index="i" from="1" to="10"><!--- Could also use to="#arrayLen(uriComponentNames)#" --->
        <!--- If the component was found in the source URI...
        • The arrayLen() check is needed to prevent a CF error when sourceUri is empty, because due to an apparent bug, reFind() does not populate backreferences for zero-length capturing groups when run against an empty string (though it does still populate backreference 0)
        • The pos[i] value check is needed to prevent a CF error when mid() is passed a start value of 0, because of the way reFind() considers an optional capturing group that does not match anything to have a pos of 0 --->
        <cfif (arrayLen(uriPattern.pos) GT 1) AND (uriPattern.pos[i] GT 0)>
            <!--- Add the component to its corresponding key in the uri struct --->
            <cfset uri[uriComponentNames[i]] = mid(sourceUri, uriPattern.pos[i], uriPattern.len[i]) />
        <!--- Otherwise, set the key value to an empty string --->
        <cfelse>
            <cfset uri[uriComponentNames[i]] = "" />
        </cfif>
    </cfloop>
    
    <!--- Always end directoryPath with a trailing backslash if the path component was present in the source URI (Note that a trailing backslash is NOT automatically inserted within or appended to the "path" key) --->
    <cfif len(uri.directoryPath) GT 0>
        <cfset uri.directoryPath = reReplace(uri.directoryPath, "/?$", "/") />
    </cfif>
    
    <cfreturn uri />
</cffunction>

Edit: I've written a JavaScript implementation of the above UDF. See parseUri(): Split URLs in JavaScript.

reMatch(): Improving ColdFusion's regex support

Update: Please see this post on my new blog, which includes a demo of the REMatch function:

REMatch (ColdFusion).

Following are some UDFs I wrote recently to make using regexes in ColdFusion a bit easier. The biggest deal here is my reMatch() function.

reMatch(), in its most basic usage, is similar to JavaScript's String.match() method. Compare getting the first number in a string using reMatch() vs. built-in ColdFusion functions:

  • reMatch():
    <cfset num = reMatch("\d+", string) />
  • reReplace():
    <cfset num = reReplace(string, "\D*(\d+).*", "\1") />
  • reFind():
    <cfset matchInfo = reFind("\d+", string, 1, TRUE) />
    <cfset num = mid(string, matchInfo.pos[1], matchInfo.len[1]) />

All of the above would return the same result, unless a number wasn't found in the string, in which case the reFind()-based method would throw an error since the mid() function would be passed a start value of 0. I think it's pretty clear from the above which approach is easiest to use for a situation like this.

Still, that's just the beginning of what reMatch() can do. Change the scope argument from the default of "ONE" to "ALL" (to follow the convention used by reReplace(), etc.), and the function will return an array of all matches. Finally, set the returnLenPos argument to TRUE and the function will return either a struct or array of structs (based on the value of scope) containing the len, pos, AND value of each match. This is very different from how the returnSubExpressions argument of reFind() works. When using returnSubExpressions, you get back a struct containing arrays of the len and pos (but not value) of each backreference from the first match.

Here's the code, with four additional UDFs (reMatchNoCase(), match(), matchNoCase(), and escapeReChars()) added for good measure:

<!--- UDFs by Steven Levithan --->

<cffunction name="reMatch" output="FALSE">
    <cfargument name="regEx" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfargument name="caseSensitive" type="boolean" required="FALSE" default="TRUE" />
    <cfset var thisMatch = "" />
    <cfset var matchInfo = structNew() />
    <cfset var matches = arrayNew(1) />
    <!--- Set the time before entering the loop --->
    <cfset var timeout = now() />
    
    <!--- Build the matches array. Continue looping until additional instances of regEx are not found. If scope is "ONE", the loop will end after the first iteration --->
    <cfloop condition="TRUE">
        <!--- By using returnSubExpressions (the fourth reFind argument), the position and length of the first match is captured in arrays named len and pos --->
        <cfif caseSensitive>
            <cfset thisMatch = reFind(regEx, string, start, TRUE) />
        <cfelse>
            <cfset thisMatch = reFindNoCase(regEx, string, start, TRUE) />
        </cfif>
        
        <!--- If a match was not found, end the loop --->
        <cfif thisMatch.pos[1] EQ 0>
            <cfbreak />
        <!--- If a match was found, and extended info was requested, append a struct containing the value, length, and position of the match to the matches array --->
        <cfelseif returnLenPos>
            <cfset matchInfo.value = mid(string, thisMatch.pos[1], thisMatch.len[1]) />
            <cfset matchInfo.len = thisMatch.len[1] />
            <cfset matchInfo.pos = thisMatch.pos[1] />
            <cfset arrayAppend(matches, matchInfo) />
        <!--- Otherwise, just append the match value to the matches array --->
        <cfelse>
            <cfset arrayAppend(matches, mid(string, thisMatch.pos[1], thisMatch.len[1])) />
        </cfif>
        
        <!--- If only the first match was requested, end the loop --->
        <cfif scope IS "ONE">
            <cfbreak />
        <!--- If the match length was greater than zero --->
        <cfelseif thisMatch.pos[1] + thisMatch.len[1] GT start>
            <!--- Set the start position for the next iteration of the loop to the end position of the match --->
            <cfset start = thisMatch.pos[1] + thisMatch.len[1] />
        <!--- If the match was zero length --->
        <cfelse>
            <!--- Advance the start position for the next iteration of the loop by one, to avoid infinite iteration --->
            <cfset start = start + 1 />
        </cfif>
        
        <!--- If the loop has run for 20 seconds, throw an error, to mitigate against overlong processing. However, note that even one pass using a poorly-written regex which triggers catastrophic backtracking could take longer than 20 seconds --->
        <cfif dateDiff("s", timeout, now()) GTE 20>
            <cfthrow message="Processing too long. Optimize regular expression for better performance" />
        </cfif>
    </cfloop>
    
    <cfif scope IS "ONE">
        <cfparam name="matches[1]" default="" />
        <cfreturn matches[1] />
    <cfelse>
        <cfreturn matches />
    </cfif>
</cffunction>

<cffunction name="reMatchNoCase" output="FALSE">
    <cfargument name="regEx" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfreturn reMatch(regEx, string, start, scope, returnLenPos, FALSE) />
</cffunction>

<cffunction name="match" output="FALSE">
    <cfargument name="substring" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfreturn reMatch(escapeReChars(substring), string, start, scope, returnLenPos, TRUE) />
</cffunction>

<cffunction name="matchNoCase" output="FALSE">
    <cfargument name="substring" type="string" required="TRUE" />
    <cfargument name="string" type="string" required="TRUE" />
    <cfargument name="start" type="numeric" required="FALSE" default="1" />
    <cfargument name="scope" type="string" required="FALSE" default="ONE" />
    <cfargument name="returnLenPos" type="boolean" required="FALSE" default="FALSE" />
    <cfreturn reMatch(escapeReChars(substring), string, start, scope, returnLenPos, FALSE) />
</cffunction>

<!--- Escape special regular expression characters (.,*,+,?,^,$,{,},(,),|,[,],\) within a string by preceding them with a forward slash (\). This allows safely using literal strings within regular expressions --->
<cffunction name="escapeReChars" returntype="string" output="FALSE">
    <cfargument name="string" type="string" required="TRUE" />
    <cfreturn reReplace(string, "[.*+?^${}()|[\]\\]", "\\\0", "ALL") />
</cffunction>

Now that I've got a deeply featured match function, all I need Adobe to add to ColdFusion in the way to regex support is lookbehinds, atomic groups, possessive quantifiers, conditionals, balancing groups, etc., etc. :-)

Tuesday, March 28, 2006

English > L337 Translator (ColdFusion)

Update: A demo is available on my new blog:

Leet Translator.

Apparently I had time to waste writing a L337 hax0r translator in ColdFusion (okay, so it only took about 15 minutes). I figured I might as well pass it on...the output is different every time & it's reasonably badass. I'll try to put it on a publically accessible ColdFusion server or rewrite it in JavaScript within a few days so you can see it in action.

<h1>L337 Translator!!</h1>

<!--- If form submitted with value --->
<cfif isDefined("Form.message") AND len(Form.message)>
   <cfset Variables.alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z" />
   <cfset Variables.cipher = "4,8,[,),3,ƒ,6,##,1,_|,X,1,|v|,|\|,0,|*,()_,2,5,+,(_),\/,\/\/,×,`/,2" />
   <cfset Variables.output = "" />
  
   <!--- Loop over received text, one character at a time --->
   <cfloop index="i" from="1" to="#len(Form.message)#">
       <!--- Gives 50% odds --->
       <cfif round(rand())>
           <!--- Add leet version of character to output --->
           <cfset Variables.output = Variables.output & replaceList(lCase(mid(Form.message, i, 1)), Variables.alphabet, Variables.cipher) />
       <cfelse>
           <cfif round(rand())>
               <!--- Add uppercase version of character to output --->
               <cfset Variables.output = Variables.output & uCase(mid(Form.message, i, 1)) />
           <cfelse>
               <!--- Add unviolated character to output --->
               <cfset Variables.output = Variables.output & mid(Form.message, i, 1) />
           </cfif>
       </cfif>
   </cfloop>
  
   <cfif round(rand())>
       <cfset Variables.suffixes = "w00t!,d00d!,pwnd!,!!!11!one!,teh l337!,hax0r!,sux0rs!" />
       <!--- Append random suffix from list to output --->
       <cfset Variables.output = Variables.output & " " & listGetAt(Variables.suffixes, int(listLen(Variables.suffixes) * rand()) + 1) />
   </cfif>
  
  
   <h2>Original Text:</h2>
   <div style="background:#d2e2ff; border:2px solid #369; padding:0 10px;">
       <p><cfoutput>#paragraphFormat(Form.message)#</cfoutput></p>
   </div>
  
   <h2>Translation:</h2>
   <div style="color:#0f0; background:#000; border:2px solid #0f0; padding:0 10px;">
       <p><cfoutput>#paragraphFormat(Variables.output)#</cfoutput></p>
   </div>
</cfif>


<form action="<cfoutput>#CGI.SCRIPT_NAME#</cfoutput>" method="post" style="margin-top:20px;">
   <cfparam name="Form.message" default="Enter text to translate" />
   <textarea name="message" style="width:300px; height:75px;"><cfoutput>#Form.message#</cfoutput></textarea>
   <br/><br/>
   <input type="submit"/>
</form>