mark nottingham

Syntax for Distributed Computing

Sunday, 24 April 2005

XML

XML is arguably one of the bigger things to come onto industry’s radar for a while, and as a result programming languages (e.g., ECMAScript, Comega, Java) are changing to accommodate it. This isn’t just happening in libraries; the syntax of the languages is changing.

This could be just because of the importance of XML, but I also think that it’s because XML is foreign to most programming models; it doesn’t fit well into data structures, objects and functions, and requires new syntax to work adequately.

I’m wondering out loud if distributed computing will end up in the same bucket. Programmers have difficulty in remembering that remote processes don’t act like local ones [pdf]; if we introduce something fundamentally different, maybe it should look different in the syntax.

For example, a while back I ran across e, which has some dedicated (and very interesting) syntax for distributed computing;

def carVow := makeCar <- ("Mercedes")
carVow <- moveTo(2,3)

I’m not sure if this is common knowledge or not; some people I’ve mentioned it to have said “oh, yeah, e is cool” in a matter-of-fact way; others get that blank look on their faces that’s more of a sign that the computing universe is still expanding, and everyone’s time is limited. Am I just way behind the times, or is this the future?


3 Comments

Paul Downey said:

‘e’ is a new one on me and looks interesting. As I’m sure you know, putting paralell processing support directly into the syntax of a programming language isn’t new and I guess reached the height of fashion back in the 80s when the debate was lightweight (co-routines, Simula and Algol’s tasks, Occam’s ‘par’) versus Ada’s overweight monitors, rendezvous, etc. I was reminded of all this recently, when looking at Ericsson’s Erlang: http://www.erlang.org/

It would be neat to see a mainstream programming language where you have to go out of your way to make a function not reentrant, e.g. make ‘synchronized’ as the default state for a Java method.

Monday, April 25 2005 at 2:37 AM

Patrick Logan said:

Making “synchronized” the default… a further step might be to make variables thread-specific by default. In order to share state between threads, one would have to explicitly declare a variable as shared.

Maybe too far from current Java but perhaps useful would be to prohibit sharing data except for objects specifically designed for sharing. i.e. no need to “synchronize” new kinds of data structures. There could be some small number of shared-memory data structures, perhaps just one: a shared queue.

Aside from a shared “map”-like structure which is discouraged, Erlang provides no sharing whatsoever. A Java-like mechanism that allowed only one shared data structure could also allow that data structure to be used in a distributed mode as well.

On another note, something like E but also something like REST is the Waterken web services model…

http://www.waterken.com/http/www.waterken.com/

And a side-note: apparently an in-progress version of Squeak Smalltalk is being developed to include capabilities based on those in E.

-Patrick

Monday, April 25 2005 at 10:08 AM

Donovan Preston said:

E has the concept of Vats, which are like isolated memory spaces with their own event loops. To communicate with an object in another vat, you have to have a capability to that object, and talking to an object in another vat is no different than talking to an object across the network (including the need for the different syntax). In a more traditional (non-JVM) environment, Vats might merely be processes.

In any case, sharing of data between vats is all explicit. Personally I don’t think there should be any shared data structures; everything should be accomplished using message-passing.

I absolutely agree that talking to remote objects should appear syntactically to be a different thing than talking to something locally. As processors become dual-core and multiprocessor systems become more common, programmers are going to have to become more aware of the data they are sharing, and where. Making it obvious when data should cross a process boundary also makes it obvious how to coordinate it across multiple processors. Explicit is good.

Tuesday, May 3 2005 at 10:24 AM