“Design depends largely on constraints.” — Charles Eames
Friday, 30 June 2006
If you boil down the BNF in both RFC2396 and RFC3986, path segments can contain the following characters without percent-encoding them:
ALPHA DIGIT ! $ & ' ( ) * + , - . : ; = @ _ ~
Query components can contain these:
ALPHA DIGIT ! $ & ' ( ) * + , - . / : ; = ? @ _ ~
Which means that
" < > [ \ ] ^ ` { | }
should always be encoded in both (discounting non-ASCII characters, for now).
If you're specifying the format of a HTTP URI, this is important; you want to be able to tell people what characters have special meaning, and when to encode them if they're part of content. When implementations automatically percent-encode some characters it can cause problems -- especially when the behaviour is different from implementation to implementation.
Note that I'm not (necessarily) saying that the latter characters should always be escaped; Web servers seem to support them in their raw form just fine, and some less fastidious Web developers may forget to un-escape them. I'm more interested in those characters that are unnecessarily escaped, which would cause trouble in some situations.
Try using your favourite resolver to access this URL:
http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
and post the results in comments. I'm particularly interested in results from Java, .NET, Perl and Ruby libraries.
Here it is as a link (without the double quotes), and using XmlHttpRequest (ditto).
Here are a few preliminary results:
Pasted into the location bar.
Safari will escape angle brackets ("<>") in a followed link (e.g., a/@href, using XHR), but not if you paste it directly into the location bar.
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;<=>@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;<=>?@[\]^_`{|}~
Pasted into the location bar.
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B%5C%5D%5E%60%7B|%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
Encoded: "<>[\]^`{}
Unencoded: !$&'()*+,-./:;=@_bceghinoru|~
Query
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~
However, Firefox will treat the last path segment differently (note the missing "/");
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[\]^%60{|}?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=@[\]^_bceghinoru{|}~
Query
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~
Pasted into the location bar.
Opera silently transforms backslashes ("\") to forward slashes ("/") in the path (but not the query).
User-Agent: Opera/9.00 (Macintosh; PPC Mac OS X; U; en)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[/]^%60{|}/?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=@[]^_bceghinoru{|}~
Query
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~
> curl -g --url `cat file.url`
User-Agent: curl/7.15.4 (powerpc-apple-darwin8.6.0) libcurl/7.15.4 OpenSSL/0.9.8b zlib/1.2.3
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^{|}?!$&'()*+,-./:;=/?@_~"<>[\]^{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;<=>@[\]^_bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;<=>?@[\]^_{|}~
> wget -i file.url --output-document=-
User-Agent: Wget/1.10.2
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[%5C]%5E%7B%7C%7D?!$&'()*+,-./:;=/?@_~%22%3C%3E[%5C]%5E%7B%7C%7D
Path
Encoded: "<>\^{|}
Unencoded: !$&'()*+,-./:;=@[]_bceghinoru~
Query
Encoded: "<>\^{|}
Unencoded: !$&'()*+,-./:;=?@[]_~
import urllib; print urllib.urlopen(url).read()
User-Agent: Python-urllib/1.16
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^{|}?!$&'()*+,-./:;=/?@_~"<>[\]^{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;<=>@[\]^_bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;<=>?@[\]^_{|}~
Filed under: Web
Ruby 1.8.4's Net::HTTP responds:
Request URI: http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;@[\]^_`bceghimnoprtuw{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;?@[\]^_`{|}~Interestingly, Ruby's URI library refuses to parse it. I've had problems with it not properly escaping [ and ] in the past, but that doesn't seem to be the problem here.
Friday, June 30 2006 at 11:25 AM +10:00
LWP::Simple on Perl ActiveState/Win32 5.8.4 responds this way:
C:\Temp>perl mnot.pl
Content:
User-Agent: lwp-trivial/1.40
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;?@[\]^_`{|}~
Friday, June 30 2006 at 12:10 PM +10:00
IE 6.0 (if you paste in address bar), you get:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[/]^`{|}/?!$&
Path
Encoded:
Unencoded: !$&'()*+,-./:;=@[]^_`abceghilmnoprtu{|}~
Query
Encoded:
Unencoded: !$&
I tried issuing a HTTP request to this url:http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[/]^`{|}/?!$&'()*+,-./:;=?@_~<>[\]^`{|}
.NET 2.0 System.Net libraries give me a "too many automatic redirections attempted" error.
Friday, June 30 2006 at 1:22 PM +10:00
Odd; I just tried and got:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; yie6; SV1)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[/]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;@[]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;?@[\]^_`{|}~Are you sure you pasted the right URL?
Friday, June 30 2006 at 1:31 PM +10:00
Camino:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.3) Gecko/20060427 Camino/1.0.1
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B%5C%5D%5E%60%7B|%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
Encoded: "<>[\]^`{}
Unencoded: !$&'()*+,-./:;=@_bceghinoru|~
Query
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~
Friday, June 30 2006 at 2:44 PM +10:00
Mark
You are right. Sorry, I must have made a mistake.Any thoughts on that redirection error? I tried issuing a HTTP request for this URL:
http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[/]^`{|}/?!$&'()*+,-./:;=?@_~<>[\]^`{|}
Friday, June 30 2006 at 6:47 PM +10:00
Not that this is of any use to you, but I was curious what Snarfer's socket library would do.
User-Agent: Snarfer/0.4.2 (http://www.snarfware.com/)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B%5C%5D%5E%60%7B%7C%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E%5B%5C%5D%5E%60%7B%7C%7D
Path
Encoded: "<>[\]^`{|}
Unencoded: !$&'()*+,-./:;=@_bceghinoru~
Query
Encoded: "<>[\]^`{|}
Unencoded: !$&'()*+,-./:;=?@_~
Friday, June 30 2006 at 8:09 PM +10:00
:Net v2.0, self written test code. Dilip, I did not have any redirection problems - but you can set the max number of redirects to follow on the HttpWebRequest object. Maybe someone sets that for you?
User-Agent: icings .netv2.0 tester
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B/%5D%5E%60%7B%7C%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E%5B%5C%5D%5E%60%7B%7C%7D
Path
Encoded: "<>[]^`{|}
Unencoded: !$&'()*+,-./:;=@_bceghinoru~
Query
Encoded: "<>[\]^`{|}
Unencoded: !$&'()*+,-./:;=?@_~
Saturday, July 1 2006 at 4:32 AM +10:00
Active Perl 5.8.8 was different:
User-Agent: LWP::Simple/5.805
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[%5C]%5E%60%7B%7C%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E[%5C]%5E%60%7B%7C%7D
Path
Encoded: "<>\^`{|}
Unencoded: !$&'()*+,-./:;=@[]_bceghinoru~
Query
Encoded: "<>\^`{|}
Unencoded: !$&'()*+,-./:;=?@[]_~
Sunday, July 23 2006 at 10:57 AM +10:00
User-Agent: Lynx/2.8.6dev.18 libwww-FM/2.14
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[\]^`{|}/?!$&'()*+,-./:;=?@_~[\]^`{|}
Path
Encoded:
Unencoded: !$&'()*+,-./:;@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !$&'()*+,-./:;?@[\]^_`{|}~
Monday, July 24 2006 at 2:25 AM +10:00
User-Agent: w3m/0.5.1+cvs-1.968
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[\]^`{|}/?!$&'()*+,-./:;=?@_~@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !$&'()*+,-./:;?@[\]^_`{|}~
Monday, July 24 2006 at 2:27 AM +10:00
It would be interresting to have the proxy headers information in the output.
Proxies are an other layer that could transform URLs.
Wednesday, January 31 2007 at 8:15 PM +10:00