mark nottingham

text/python?

Wednesday, 15 December 2004

I’m thinking about whether it would be a good idea to have a media type for Python source files, call it “text/python.”

The main benefit that I see to doing this is the definition of a fragment identifier syntax; i.e., what the bit after the ‘#’ refers to. This would allow URIs to point to specific functions and classes in Python source files, which would be very useful when documenting code.

It would also allow some cool import tricks (if I read PEP 302 correctly, this is possible now, and will soon be pretty easy to enable), such as;

import "https://www.mnot.net/sw/sparta/sparta.py" as sparta

or even

import "https://www.mnot.net/sw/sparta/sparta.py#ThingFactory" as ThingFactory

Some client-side persistent caching could make this a really nice way to distribute software, if properly thought out. It’s also one more step towards Webizing Python.

There are also some smaller, but nice, benefits, like being able to use the media type, rather than content sniffing, for syntax colouring, dispatching to Python editors straight off the Web, and being able to specify the encoding of the source in a way that’s well-aligned with the method that Python already defines.

Thoughts? I’m considering writing an Internet-Draft and/or a PEP, but wanted to get some informal feedback first.


9 Comments

Ian Bicking said:

I’ve used text/x-python-source when I feel a need to make up a MIME type. I doubt off-the-internet importing will be that interesting until restricted execution is available – there’s just too many security problems. Maybe with signed source – message/multipart, with a source enclosure and accompanying signature?

Tuesday, December 14 2004 at 11:21 AM

Manfred Stienstra said:

Shouldn’t that be ‘application/python’? Please note that PEP 263 might be implemented in the future and the encoding of the python source might change from undefined to utf-8 (for instance). This could cause a problem if a proxy decided to changed the encoding of text/* files.

Wednesday, December 15 2004 at 2:06 AM

Ian Bicking said:

FWIW, it’s text/javascript, not application/javascript.

I think the problem with constant-possible-upgrades is the stability of the system. Currently updates happen at explicit times, presumably with some control and a readiness for problems. Maybe if you can fit versioning in there somehow, you can make the expectations explicit in the URL.

If you really want to do it, though, http://codespeak.net/py has a path object, including subversion paths, and it can load python code from them. The idea is to run something directly out of a (potentially) remote subversion repository. It would be easy enough to add an http version of the filesystem as well, at which point you could load code off the web. Not as nice as an import statement, but it kind of works. Actually, it maybe kind of works – there’s some subtleties that aren’t worked out.

Wednesday, December 15 2004 at 3:57 AM

Paul Hoffman said:

If python consumers follow the rules given for text/*, it should be text/, not application/.

I like the idea of text/python-source and text/perl-source and… It’s worth a shot to see if the MIME weenies who are wiser and more experienced than us agree.

Wednesday, December 15 2004 at 6:26 AM

Damian Cugley said:

It makes sense to use text/python if transcoding and display of source files is possible without knowing Python-specific quirks. That is, if my web server serves code as ‘text/python; charset=utf-8’, then it can be displayed by any system that understands UTF-8 and MIME, without needing to know Python conventions.

This is OK so long as the default for text/python with NO charset specified is US-ASCII (this is required by MIME), and if Python files do not refer to their own encoding (e.g., do NOT start with ‘<?python version=”2.5” encoding=”windows-1252”?>’) (this is required to allow transcoding and display of text/* content).

If we instead want to have Python processors second-guess the server’s charset attributes and guess the encoding based on leading bytes in the way XML does, then, like XML, we must use ‘application/python’.

Monday, December 20 2004 at 7:41 AM

Manfred Stienstra said:

Ok, so like I said before (assuming you knew about the issues Damian mentiones) is that PEP 263 proposes to use in-document encoding declaration.

[quote] To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file:

#!/usr/bin/python

-- coding: --

[/quote]

Tuesday, December 21 2004 at 2:44 AM

Paul Sowden said:

“FWIW, it’s text/javascript, not application/javascript.”

text/javascript is not a registered MIME type. Apache serves files with a js extension as application/x-javascript by default.

From RFC 2046 the overview of the text top-level media type says:

“[..] subtypes are to be used for enriched text in forms where application software may enhance the appearance of the text, but such software must not be required in order to get the general idea of the content. Possible subtypes of “text” thus include any word processor format that can be read without resorting to software that understands the format.

This does not really imply source code but rather natural language.

As I’m aware the fact the W3C chose to use application/xhtml+xml is nod to the fact that it is quite a stretch with some HTML document to be able to read the content without a renderer and as such it is sometimes questioned whether text/html was the right choice. As Bert Bos put it:

“If you know the HTML format and you have looked at the source of some pages on the Web, you have probably wondered occasionally what monster was able to mangle the HTML code behind a simple Web page or an e-mail message in such a way that it is hard to believe that HTML was ever called a ‘structured language.’”

Another source of “prior art” if you will, the Gnome desktop currently identifies Python files with the MIME type application/x-python, although there is no authority to be implied here.

Thursday, January 6 2005 at 5:32 AM