Update: Please see the latest version of this function on my new blog:
For fun, I spent the 10 minutes needed to convert my parseUri()
ColdFusion UDF into a JavaScript function.
For those who haven't already seen it, I'll repeat my explanation from the other post…
parseUri()
splits any well-formed URI into its parts (all are optional). Note that all parts are split with a single regex using backreferences, and all groupings which don't contain complete URI parts are non-capturing. My favorite bit of this function is its robust support for splitting the directory path and filename (it supports directories with periods, and without a trailing backslash), which I haven't seen matched in other URI parsers. Since the function returns an object, you can do, e.g., parseUri(someUri).anchor
, etc.
I should note that, by design, this function does not attempt to validate the URI it receives, as that would limit its flexibility. IMO, validation is an entirely unrelated process that should come before or after splitting a URI into its parts.
This function has no dependencies, and should work cross-browser. It has been tested in IE 5.5–7, Firefox 2, and Opera 9.
/* parseUri JS v0.1, by Steven Levithan (http://badassery.blogspot.com) Splits any well-formed URI into the following parts (all are optional): ---------------------- • source (since the exec() method returns backreference 0 [i.e., the entire match] as key 0, we might as well use it) • protocol (scheme) • authority (includes both the domain and port) • domain (part of the authority; can be an IP address) • port (part of the authority) • path (includes both the directory path and filename) • directoryPath (part of the path; supports directories with periods, and without a trailing backslash) • fileName (part of the path) • query (does not include the leading question mark) • anchor (fragment) */ function parseUri(sourceUri){ var uriPartNames = ["source","protocol","authority","domain","port","path","directoryPath","fileName","query","anchor"]; var uriParts = new RegExp("^(?:([^:/?#.]+):)?(?://)?(([^:/?#]*)(?::(\\d*))?)?((/(?:[^?#](?![^?#/]*\\.[^?#/.]+(?:[\\?#]|$)))*/?)?([^?#/]*))?(?:\\?([^#]*))?(?:#(.*))?").exec(sourceUri); var uri = {}; for(var i = 0; i < 10; i++){ uri[uriPartNames[i]] = (uriParts[i] ? uriParts[i] : ""); } // Always end directoryPath with a trailing backslash if a path was present in the source URI // Note that a trailing backslash is NOT automatically inserted within or appended to the "path" key if(uri.directoryPath.length > 0){ uri.directoryPath = uri.directoryPath.replace(/\/?$/, "/"); } return uri; }
Is there any leaner, meaner URI parser out there? :-)
To make it easier to test this function, here is some code that can be copied and pasted into a new HTML file, allowing you to easily enter URIs and see the results.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Steve's URI Parser</title> <script type="text/javascript"> //<![CDATA[ /* parseUri JS v0.1, by Steven Levithan (http://badassery.blogspot.com) Splits any well-formed URI into the following parts (all are optional): ---------------------- • source (since the exec() method returns backreference 0 [i.e., the entire match] as key 0, we might as well use it) • protocol (scheme) • authority (includes both the domain and port) • domain (part of the authority; can be an IP address) • port (part of the authority) • path (includes both the directory path and filename) • directoryPath (part of the path; supports directories with periods, and without a trailing backslash) • fileName (part of the path) • query (does not include the leading question mark) • anchor (fragment) */ function parseUri(sourceUri){ var uriPartNames = ["source","protocol","authority","domain","port","path","directoryPath","fileName","query","anchor"]; var uriParts = new RegExp("^(?:([^:/?#.]+):)?(?://)?(([^:/?#]*)(?::(\\d*))?)?((/(?:[^?#](?![^?#/]*\\.[^?#/.]+(?:[\\?#]|$)))*/?)?([^?#/]*))?(?:\\?([^#]*))?(?:#(.*))?").exec(sourceUri); var uri = {}; for(var i = 0; i < 10; i++){ uri[uriPartNames[i]] = (uriParts[i] ? uriParts[i] : ""); } // Always end directoryPath with a trailing backslash if a path was present in the source URI // Note that a trailing backslash is NOT automatically inserted within or appended to the "path" key if(uri.directoryPath.length > 0){ uri.directoryPath = uri.directoryPath.replace(/\/?$/, "/"); } return uri; } // Dump the test results in the page function dumpResults(obj){ var output = ""; for (var property in obj){ output += '<tr><td class="name">' + property + '</td><td class="result">"<span class="value">' + obj[property] + '</span>"</td></tr>'; } document.getElementById('output').innerHTML = "<table>" + output + "</table>"; } //]]> </script> <style type="text/css" media="screen"> h1 {font-size:1.25em;} table {border:solid #333; border-width:1px; background:#f5f5f5; margin:15px 0 0; border-collapse:collapse;} td {border:solid #333; border-width:1px 1px 0 0; padding:4px;} .name {font-weight:bold;} .result {color:#aaa;} .value {color:#33c;} </style> </head> <body> <h1>Steve's URI Parser</h1> <form action="#" onsubmit="dumpResults(parseUri(document.getElementById('uriInput').value)); return false;"> <div> <input id="uriInput" type="text" style="width:500px" value="http://www.domain.com:81/dir1/dir.2/index.html?id=1&test=2#top" /> <input type="submit" value="Parse" /> </div> </form> <div id="output"> </div> <p><a href="http://badassery.blogspot.com">My blog</a></p> </body> </html>
Edit: This function doesn't currently support URIs which include a username or username/password pair (e.g., "http://user:password@domain.com/"). I didn't care about this when I originally wrote the ColdFusion UDF this is based on, since I never use such URIs. However, since I've released this I kind of feel like the support should be there. Supporting such URIs and appropriately splitting the parts would be easy. What would take much longer is setting up an appropriate, large list of all kinds of URIs (both well-formed and not) to retest the function against. However, if several people leave comments asking for the support, I'll go ahead and add it. I could also add more pre-concatenated parts (e.g., "relative" for everything starting with the path) or other stuff like "tld" (for just the top-level domain) if readers think it would be useful.
Update: Please see the latest version of this function on my new blog:
You might also be looking for my script which fixes the JavaScript split method cross-browser.