tag:blogger.com,1999:blog-24744374.post8460408275485585402..comments2022-08-24T11:25:53.585-05:00Comments on Flagrant Badassery: parseUri(): Split URLs in JavaScriptStevehttp://www.blogger.com/profile/18374441096323901069noreply@blogger.comBlogger26125tag:blogger.com,1999:blog-24744374.post-32310134220068477472008-12-12T03:26:00.000-05:002008-12-12T03:26:00.000-05:00Can you prepend the license in the javascript file...Can you prepend the license in the javascript fileAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-89912664910499560542008-10-25T13:28:00.000-05:002008-10-25T13:28:00.000-05:00This is remarkable info you have posted mate. real...This is remarkable info you have posted mate. really will help me a lot. thanks a lot Thomas.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-760553201731248152007-04-25T10:52:00.000-05:002007-04-25T10:52:00.000-05:00@joshJavaScript doesn't support named capturing gr...@josh<BR/><BR/>JavaScript doesn't support named capturing groups. I'm assigning names to each part by mapping names from the uriPartNames array to the array of backreferences returned by the RegExp.exec() method. Parentheses are used to capture the backreferences, but not all of the parentheses are part of capturing groups.<BR/><BR/>As for your task, there are some cases you might not be thinking about. E.g., how would "www.google.co.uk", "64.233.287.99", or something like "localhost" be handled? (By the way, the Top Level Domain from your example would be "com", not "google.com".)Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-27190335080399862612007-04-24T15:04:00.000-05:002007-04-24T15:04:00.000-05:00Thanks Steve for a very useful function. There is ...Thanks Steve for a very useful function. There is one thing I'd like to do with this function, and I'm not sure how to do it - I need to split the hostname further, and only retain the TLD portion. So I would match google.com in mail.google.com and google.com and www.google.com. My psuedo regex for this would be [optional some characters including dots][some characters without dots].com. The first portion I wouldn't need access to. The second portion I would. I'm not quite sure how to express this in real regex, in particular it's not clear how to indicate that a piece of a match should be "named". Is it parens? Anyway, any tips you could give on this would be great. FYI I need to write this function so I can set cookies via Javascript that can be set in one subdomain and read in another. According to <A HREF="http://www.w3.org/Protocols/rfc2109/rfc2109.txt" REL="nofollow">the rfc for cookies</A> you should be able to set the Domain attribute to the TLD portion, prepending a dot, and that cookie will be sent by the browser to subdomains.joshhttps://www.blogger.com/profile/09140949936435640406noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-86860189203907714332007-03-27T00:55:00.000-05:002007-03-27T00:55:00.000-05:00@Scott:No, it makes no such assumption. It simply ...@Scott:<BR/><BR/>No, it makes no such assumption. It simply splits the URI in the most logical way according to its rules. See my note on how this function intentionally does not attempt to validate the URIs it receives.Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-52671537665833592782007-03-26T20:05:00.000-05:002007-03-26T20:05:00.000-05:00the js thinks this is a valid urlhttp:/example.com...the js thinks this is a valid url<BR/><BR/>http:/example.comAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-66948720156663451442007-03-17T12:07:00.000-05:002007-03-17T12:07:00.000-05:00Well, I'm still having problems with setting up my...Well, I'm still having problems with setting up my blog the way I want it with my new host (e.g., they're still trying to resolve issues with URL rewriting, etc.), but since I don't know when everything will be resolved, here's a link to the demo page for the latest version of parseUri:<BR/><BR/><A HREF="http://stevenlevithan.com/demo/parseUri/js.cfm" REL="nofollow">http://stevenlevithan.com/demo/parseUri/js.cfm</A>Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-67565855602920313332007-03-16T00:40:00.000-05:002007-03-16T00:40:00.000-05:00If you do write user:pass support, please let me k...If you do write user:pass support, please let me know. I'm including your URI parser in my module loader library project. http://cixar.com/tracs/javascriptKris Kowalhttps://www.blogger.com/profile/01443956999129365941noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-24483089821610341242007-03-03T16:04:00.000-05:002007-03-03T16:04:00.000-05:00Yeah, bad ass piece of code and some really master...Yeah, bad ass piece of code and some really masterful regexery! Saved me a good hour. Keep up the good workAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-80340733152611723262007-02-28T00:04:00.000-05:002007-02-28T00:04:00.000-05:00@derek:Thanks. License is MIT-style.@Thomas Messie...@derek:<BR/><BR/>Thanks. License is MIT-style.<BR/><BR/>@Thomas Messier and Paul Irish:<BR/><BR/>Since it's clear that query-splitting is helpful for some users, I've gone ahead and added an implementation of this functionality (to the forthcoming version of parseUri) which uses 4 lines of code and additionally supports query keys which aren't followed by "=" as well as query values which contain "=". This, along with support for userInfo and extensive new demos, is all ready to go, but I'm hoping to release this on my own domain, and I'm currently having some trouble with my new host. I'll include an update here as soon as this is resolved (hopefully within a couple days).Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-59886213493986976992007-02-27T10:01:00.000-05:002007-02-27T10:01:00.000-05:00very nice... what sort of license is this covered ...very nice... what sort of license is this covered by, if any?Derekhttps://www.blogger.com/profile/01047286506601616635noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-87010661698189611072007-02-23T15:54:00.000-05:002007-02-23T15:54:00.000-05:00Thanks Thomas, I used your queryvars addition. Ver...Thanks Thomas, I used your queryvars addition. Very nice.<BR/><BR/>(and Steve. this is a killer function. thank you.)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-46413980143969529052007-02-22T11:46:00.000-05:002007-02-22T11:46:00.000-05:00I'm not sure this should be within the scope of th...I'm not sure this should be within the scope of this function, but I find it useful to be able to actually access the query string variables. As such, I added some code to the function to create an object (called queryVars) that serves as a hash of URL variables. That way you can do parseUri(window.location).queryVars.MyURLVar to access the value of a URL variable. Please note I just did this in 5 minutes and I'm sure it's not full-proof, but it's an idea... The code is as follows:<BR/><BR/>for(var i = 0; i < 10; i++) {<BR/> uri[uriPartNames[i]] = (uriParts[i] ? uriParts[i] : "");<BR/> <BR/> if ( uriParts[i] && uriPartNames[i] == 'query' ) {<BR/> uri['queryVars'] = {};<BR/> var qString = uriParts[i];<BR/> qString = qString.split('&');<BR/> for (var j=0; j<qString.length; j++) {<BR/> var qVar = qString[j].split('=');<BR/> var qKey = qVar[0];<BR/> var qVal = qVar[1];<BR/> uri['queryVars'][qKey] = qVal;<BR/> }<BR/> }<BR/>}Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-22990322122291920482007-02-13T23:16:00.000-05:002007-02-13T23:16:00.000-05:00BTW, I've updated my local copy of the regex to in...BTW, I've updated my local copy of the regex to include support for usernames and passwords, while also appropriately splitting URIs which <EM>start</EM> with a username/password pair (i.e., they're not preceded by a protocol and/or "//"). I'll include this in v0.2 of this function, along with a few other minor changes/tweaks. Hopefully I'll release this within a few days (after more testing).<BR/><BR/>Also, Dan G. Switzer, I've decided against supporting filename param segments (e.g., "file.gif;p=5"), since as far as I understand they're deprecated by <A HREF="http://tools.ietf.org/html/rfc3986" REL="nofollow">RFC 3986</A>, and in any case they can easily be tested for after the fact since they're picked up as part of the file name. I've also decided against returning an array of objects containing the names and values of each discrete query parameter, since this is easy to implement in a separate function when needed (queries have only two, easily distinguishable delimiters: "&" and "="), and it would add to the function's length. I also don't want to get carried away with the idea (e.g., returning arrays containing each subdomain, directory, etc.).Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-81101592322441389802007-02-13T10:43:00.000-05:002007-02-13T10:43:00.000-05:00Seb, that seems pretty reasonable, and after a min...Seb, that seems pretty reasonable, and after a minute testing it with several URIs it seems to hold up well (aside from when you start a URI with a username/password pair, but I'm not sure if I'd do anything to change the behavior).<BR/><BR/>BTW, here are a couple ways your addition to the regex can be tweaked, after a quick lookover. Like I mentioned, I haven't looked into this in depth.<BR/><BR/>• Change "(?::)?" to simply ":?" (the grouping is not necessary to make it optional).<BR/>• Replace both instances of "[^:]+" with "[^:@]+" (this will improve efficiency and performance when tested against certain types of values, by reducing the amount of backtracking required).<BR/><BR/>Whenever I find some time to do more extensive re-testing, I'll go ahead and add support for these and other additional URI parts.Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-60509531244756323022007-02-13T09:10:00.000-05:002007-02-13T09:10:00.000-05:00hi,i'm not familiar with regular expressions, so i...hi,<BR/><BR/>i'm not familiar with regular expressions, so i tried to extract the user infos as an exercise...<BR/>so i added "userInfo", "userName", "password" in between "authority" and "domain" in uriPartNames, and added this part to your regexp :<BR/>"(" + "(?:(([^:]+)?(?::)?([^:]+)?)?@)?" + "([^:/?#]*)(?::(\\d*))?)?"<BR/><BR/>well, it seems to work with :<BR/>http://userName:password@www.domain.com:81/dir1/dir.2/index.html?id=1&test=2#top<BR/>http://userName:@www.domain.com:81/dir1/dir.2/index.html?id=1&test=2#top<BR/>http://userName@www.domain.com:81/dir1/dir.2/index.html?id=1&test=2#top<BR/><BR/>please tell me if i'm wrong and/or if there is a better way to do it !<BR/><BR/>thank you.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-81794143694439861952007-02-12T16:29:00.000-05:002007-02-12T16:29:00.000-05:00Nice work, Dan G. Switzer, II.BTW, one of the fund...Nice work, Dan G. Switzer, II.<BR/><BR/>BTW, one of the fundamental differences between our two UDFs (which adds some complexity to mine) is that with, e.g., the URIs "/dir/sub" and "/dir/sub?q", your UDF will treat "sub" as the file name, while mine will treat it as part of the directory path. Since many people enter directory paths without a trailing backslash (and such URIs work with every HTTP server I'm familiar with), I've found this adjustment to be a necessity.<BR/><BR/>Also, one issue I noticed during a very brief test is that, e.g., with the URI "www.foo.com:80/dir/", your UDF treats the "80" as part of the directory path, returns no authority, and returns "www.foo.com" as the scheme. Although this may be technically correct according to generic URI syntax (I understand why the scheme comes out the way it does, but I'm not so sure about "80" as part of the directory path), it prevents the common scenario of users entering URIs which start with a domain name, without the leading "//" to identify it as the authority. Other examples of differences are that your UDF will treat "www.foo.com" as a file name, and "www.foo.com/dir/" as one component comprised solely of a directory path. On the other hand, in all of the above cases parseUri() will identify "www.foo.com" as the domain, and "/dir/" as the path. I'm not noting this to claim superiority, but rather to point out additional areas where I've found that slightly diverging from the official generic URI syntax spec allows the function to become much more "real-world ready," and able to actually be tested against end user input.<BR/><BR/>Finally, I know code brevity was probably not your goal, but page weight becomes especially important with a JavaScript implementation. The over 90 lines of code (after stripping all comments and empty lines) in the post you linked to seems on the heavy side.<BR/><BR/>Nevertheless, it's a solid, fully-featured implementation, and gives me more incentive to add support for the missing pieces from my function (username/password/segment [these shouldn't add any lines of code], and param splitting).Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-87448586616317262822007-02-12T15:55:00.000-05:002007-02-12T15:55:00.000-05:00For those who didn't find this via Ajaxian, here's...For those who didn't find this via Ajaxian, here's the link: <A HREF="http://ajaxian.com/archives/parseuri-another-javascript-url-parser" REL="nofollow">Ajaxian: parseUri: Another JavaScript URL parser</A>.Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-68548213717338193912007-02-12T14:30:00.000-05:002007-02-12T14:30:00.000-05:00@thunder down under:Poly9's URL parser is weak. Aj...@thunder down under:<BR/><BR/>Poly9's URL parser is weak. Ajaxian posted the reasons for this I sent them, though in my defense I hadn't meant for them to actually publish the list. Rather, it was part of my pitch towards why they might want to feature another URI parser even though they'd done so recently.<BR/><BR/>IMO, rewriting Poly9's parser to depend on a massive library like Prototype is extra weak.Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-66399620564899001012007-02-12T14:19:00.000-05:002007-02-12T14:19:00.000-05:00Here's a link that works: Prototype URI parser lib...Here's a link that works: <BR/><A HREF="http://www.flog.co.nz/index.php/journal/prototype-uri-parser-class/" REL="nofollow">Prototype URI parser library</A>Thunder Down Underhttps://www.blogger.com/profile/08134596025242835694noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-61085386311408873982007-02-12T14:18:00.000-05:002007-02-12T14:18:00.000-05:00Might I also suggest looking athttp://www.flog.co....Might I also suggest looking athttp://www.flog.co.nz/index.php/journal/prototype-uri-parser-class/<BR/><BR/>This is a Prototype based class which is designed to slot in nicely. It will pass full Uri's, such as 'http://user:password@www.flog.co.nz:80/pathname?querystring&key=value#fragment'Thunder Down Underhttps://www.blogger.com/profile/08134596025242835694noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-45486449558293324032007-02-12T11:02:00.000-05:002007-02-12T11:02:00.000-05:00A while back I wrote a CF UDF as well, which is ba...A while back I wrote a CF UDF as well, which is basically the same, but passes back a little more information.<BR/><BR/>I had to make sure segments were supported and also wanted the url parameters to be returned in a more useful state.<BR/><BR/>http://blog.pengoworks.com/blogger/index.cfm?action=blog:565Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-84179136795581452462007-02-12T10:13:00.000-05:002007-02-12T10:13:00.000-05:00I could definitely use the "name@place" support.I could definitely use the "name@place" support.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-24744374.post-50546566954844853672007-02-09T12:26:00.000-05:002007-02-09T12:26:00.000-05:00Thanks! I'll be using your function from now on.Thanks! I'll be using your function from now on.Benhttps://www.blogger.com/profile/01361725622852028759noreply@blogger.comtag:blogger.com,1999:blog-24744374.post-85544335690465339202007-02-08T11:31:00.000-05:002007-02-08T11:31:00.000-05:00Boyan, thanks! As for the code you posted, well, b...Boyan, thanks! As for the code you posted, well, beyond being far less powerful/flexible, the first thing that jumps out at me when looking over the regex is that it wouldn't even match or split the URI "http://www.google.com/". In other words, it's deeply flawed.Stevehttps://www.blogger.com/profile/18374441096323901069noreply@blogger.com