mnot’s blog

Design depends largely on constraints.” — Charles Eames

Friday, 30 June 2006

Friday Fun: Percent Encoding

If you boil down the BNF in both RFC2396 and RFC3986, path segments can contain the following characters without percent-encoding them:

ALPHA DIGIT ! $  & ' ( ) * + , - . : ; = @ _ ~

Query components can contain these:

ALPHA DIGIT ! $  & ' ( ) * + , - . / : ; = ? @ _ ~

Which means that

" < > [ \ ] ^ ` { | }

should always be encoded in both (discounting non-ASCII characters, for now).

If you're specifying the format of a HTTP URI, this is important; you want to be able to tell people what characters have special meaning, and when to encode them if they're part of content. When implementations automatically percent-encode some characters it can cause problems -- especially when the behaviour is different from implementation to implementation.

Note that I'm not (necessarily) saying that the latter characters should always be escaped; Web servers seem to support them in their raw form just fine, and some less fastidious Web developers may forget to un-escape them. I'm more interested in those characters that are unnecessarily escaped, which would cause trouble in some situations.

The Test

Try using your favourite resolver to access this URL:

http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}

and post the results in comments. I'm particularly interested in results from Java, .NET, Perl and Ruby libraries.

Here it is as a link (without the double quotes), and using XmlHttpRequest (ditto).

 

Here are a few preliminary results:

Safari

Pasted into the location bar.

Safari will escape angle brackets ("<>") in a followed link (e.g., a/@href, using XHR), but not if you paste it directly into the location bar.

 User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
    Encoded: 
  Unencoded: !"$&'()*+,-./:;<=>@[\]^_`bceghinoru{|}~
Query
    Encoded: 
  Unencoded: !"$&'()*+,-./:;<=>?@[\]^_`{|}~

Firefox

Pasted into the location bar.

 User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B%5C%5D%5E%60%7B|%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
    Encoded: "<>[\]^`{}
  Unencoded: !$&'()*+,-./:;=@_bceghinoru|~
Query
    Encoded: "<>`
  Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~

However, Firefox will treat the last path segment differently (note the missing "/");

 User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[\]^%60{|}?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
    Encoded: "<>`
  Unencoded: !$&'()*+,-./:;=@[\]^_bceghinoru{|}~
Query
    Encoded: "<>`
  Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~

Opera

Pasted into the location bar.

Opera silently transforms backslashes ("\") to forward slashes ("/") in the path (but not the query).

 User-Agent: Opera/9.00 (Macintosh; PPC Mac OS X; U; en)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[/]^%60{|}/?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
    Encoded: "<>`
  Unencoded: !$&'()*+,-./:;=@[]^_bceghinoru{|}~
Query
    Encoded: "<>`
  Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~

Curl

> curl -g --url `cat file.url`

 User-Agent: curl/7.15.4 (powerpc-apple-darwin8.6.0) libcurl/7.15.4 OpenSSL/0.9.8b zlib/1.2.3
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^{|}?!$&'()*+,-./:;=/?@_~"<>[\]^{|}
Path
    Encoded: 
  Unencoded: !"$&'()*+,-./:;<=>@[\]^_bceghinoru{|}~
Query
    Encoded: 
  Unencoded: !"$&'()*+,-./:;<=>?@[\]^_{|}~

WGet

> wget -i file.url --output-document=-

User-Agent: Wget/1.10.2
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[%5C]%5E%7B%7C%7D?!$&'()*+,-./:;=/?@_~%22%3C%3E[%5C]%5E%7B%7C%7D
Path
    Encoded: "<>\^{|}
  Unencoded: !$&'()*+,-./:;=@[]_bceghinoru~
Query
    Encoded: "<>\^{|}
  Unencoded: !$&'()*+,-./:;=?@[]_~

Python

import urllib; print urllib.urlopen(url).read()

 User-Agent: Python-urllib/1.16
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^{|}?!$&'()*+,-./:;=/?@_~"<>[\]^{|}
Path
    Encoded: 
  Unencoded: !"$&'()*+,-./:;<=>@[\]^_bceghinoru{|}~
Query
    Encoded: 
  Unencoded: !"$&'()*+,-./:;<=>?@[\]^_{|}~

Filed under: Web

12 Comments

Brendan Taylor said:

Ruby 1.8.4's Net::HTTP responds:

Request URI: [\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}">http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;@[\]^_`bceghimnoprtuw{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;?@[\]^_`{|}~

Interestingly, Ruby's URI library refuses to parse it. I've had problems with it not properly escaping [ and ] in the past, but that doesn't seem to be the problem here.

Friday, June 30 2006 at 11:25 AM +10:00

Chris Winters said:

LWP::Simple on Perl ActiveState/Win32 5.8.4 responds this way:

C:\Temp>perl mnot.pl
Content:
User-Agent: lwp-trivial/1.40
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[\]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;?@[\]^_`{|}~

Friday, June 30 2006 at 12:10 PM +10:00

Dilip said:

IE 6.0 (if you paste in address bar), you get:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[/]^`{|}/?!$&
Path
Encoded:
Unencoded: !$&'()*+,-./:;=@[]^_`abceghilmnoprtu{|}~
Query
Encoded:
Unencoded: !$&


I tried issuing a HTTP request to this url:

[\]^`{|}">http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[/]^`{|}/?!$&'()*+,-./:;=?@_~<>[\]^`{|}

.NET 2.0 System.Net libraries give me a "too many automatic redirections attempted" error.

Friday, June 30 2006 at 1:22 PM +10:00

Mark Nottingham said:

Odd; I just tried and got:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; yie6; SV1)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~"<>[/]^`{|}/?!$&'()*+,-./:;=?@_~"<>[\]^`{|}
Path
Encoded:
Unencoded: !"$&'()*+,-./:;@[]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !"$&'()*+,-./:;?@[\]^_`{|}~

Are you sure you pasted the right URL?

Friday, June 30 2006 at 1:31 PM +10:00

Tim Bray said:

Camino:

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.3) Gecko/20060427 Camino/1.0.1
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B%5C%5D%5E%60%7B|%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E[\]^%60{|}
Path
Encoded: "<>[\]^`{}
Unencoded: !$&'()*+,-./:;=@_bceghinoru|~
Query
Encoded: "<>`
Unencoded: !$&'()*+,-./:;=?@[\]^_{|}~

Friday, June 30 2006 at 2:44 PM +10:00

Dilip said:

Mark
You are right. Sorry, I must have made a mistake.

Any thoughts on that redirection error? I tried issuing a HTTP request for this URL:
[/]^`{|}/?!$&'()*+,-./:;=?@_~<>[\]^`{|}">http://www.mnot.net/cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[/]^`{|}/?!$&'()*+,-./:;=?@_~<>[\]^`{|}

Friday, June 30 2006 at 6:47 PM +10:00

James Holderness said:

Not that this is of any use to you, but I was curious what Snarfer's socket library would do.

User-Agent: Snarfer/0.4.2 (http://www.snarfware.com/)
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B%5C%5D%5E%60%7B%7C%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E%5B%5C%5D%5E%60%7B%7C%7D
Path
Encoded: "<>[\]^`{|}
Unencoded: !$&'()*+,-./:;=@_bceghinoru~
Query
Encoded: "<>[\]^`{|}
Unencoded: !$&'()*+,-./:;=?@_~

Friday, June 30 2006 at 8:09 PM +10:00

Stefan Eissing said:

:Net v2.0, self written test code. Dilip, I did not have any redirection problems - but you can set the max number of redirects to follow on the HttpWebRequest object. Maybe someone sets that for you?

User-Agent: icings .netv2.0 tester
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E%5B/%5D%5E%60%7B%7C%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E%5B%5C%5D%5E%60%7B%7C%7D
Path
Encoded: "<>[]^`{|}
Unencoded: !$&'()*+,-./:;=@_bceghinoru~
Query
Encoded: "<>[\]^`{|}
Unencoded: !$&'()*+,-./:;=?@_~

Saturday, July 1 2006 at 4:32 AM +10:00

Ken Hirsch said:

Active Perl 5.8.8 was different:

User-Agent: LWP::Simple/5.805
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~%22%3C%3E[%5C]%5E%60%7B%7C%7D/?!$&'()*+,-./:;=?@_~%22%3C%3E[%5C]%5E%60%7B%7C%7D
Path
Encoded: "<>\^`{|}
Unencoded: !$&'()*+,-./:;=@[]_bceghinoru~
Query
Encoded: "<>\^`{|}
Unencoded: !$&'()*+,-./:;=?@[]_~

Sunday, July 23 2006 at 10:57 AM +10:00

karl said:

User-Agent: Lynx/2.8.6dev.18 libwww-FM/2.14
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[\]^`{|}/?!$&'()*+,-./:;=?@_~[\]^`{|}
Path
Encoded:
Unencoded: !$&'()*+,-./:;@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !$&'()*+,-./:;?@[\]^_`{|}~

Monday, July 24 2006 at 2:25 AM +10:00

karl said:

User-Agent: w3m/0.5.1+cvs-1.968
Request URI: /cgi-bin/echo-uri/!$&'()*+,-.:;=@_~<>[\]^`{|}/?!$&'()*+,-./:;=?@_~@[\]^_`bceghinoru{|}~
Query
Encoded:
Unencoded: !$&'()*+,-./:;?@[\]^_`{|}~

Monday, July 24 2006 at 2:27 AM +10:00

Olivier Mengué said:

It would be interresting to have the proxy headers information in the output.

Proxies are an other layer that could transform URLs.

Wednesday, January 31 2007 at 8:15 PM +10:00

Creative Commons