mark nottingham

document(Web)

Tuesday, 22 February 2005

XML

I love the XSLT document function. With it, you can access the whole Web from a stylesheet; this gives a lot of flexibility, in the right situation.

For example, my local library’s online system is based upon iPac (now sold as the Horizon Information Portal, I think), a common packaged library management system. One of its nifty features is letting you keep a list of books (“My List”) that you’d like to eventually check out of the library. In conjunction with Jon Udell’s LibraryLookup bookmarklet, you can shift from keeping books in your Amazon shopping cart and buying them to keeping them in your library’s list and borrowing them. Cool.

That said, the data isn’t exactly in the format that I want it; to figure out what’s available in my local branch, I have to click through and scan a list of branches. What I’d really like is a list of books that are on my list, at my local branch, and currently available, all on one page. While we’re at it, I’d like to have it available on my phone, so I can know where to go in the stacks when I’m looking for something to read.

A bit of Googling turns up a nifty feature in iPac; if you append ‘GetXML=true’ to the URL’s query arguments, you get back an XML representation of the page’s underlying data. Unfortunately, iPac doesn’t use HTTP authentication; it has a form login and then gives you a session ID, but luckily, the session ID is in the URL, not in a cookie.

Enter document(). Because everything’s in the URL, it’s possible to use XSLT to log in, get a session ID, get My List, find availability information for each individual book, and then log out. All in one stylesheet.

Here’s the stylesheet and an example of a document to run it against; you’ll need to supply a username and password, as well as a search URI and a branch name that you’re interested in. Needless to say, this is incredibly iPac-specific, and even with other iPac systems, may need tweaking.

You can also see a snapshot of the books which are both currently interesting to me, and available at the Burlingame library.

One caution; this is pretty resource-intensive on the library’s servers, because it has to check each book’s availability with a GET. I’ve got mine running with a cron job just every few days, so it won’t stress them; I wouldn’t suggest running it as a client-side stylesheet for this reason.


2 Comments

Bob DuCharme said:

I’ve written an article on using XSLT as a web service client at http://www.xml.com/pub/a/2004/12/01/tr.html if anyone’s interested in further background on doing this.

Wednesday, February 23 2005 at 6:07 AM

RC said:

I am using XSLT to transform XML into HTML. While stepping through the XSLT, I would like to go to the URL below (I get the last ten digits from the XML) and scrape the company name from the page and put it into my HTML output. Any help would be greatly appreciated.

http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001140586

below is how the page listed above displays the company name

Company Information: NEW FRONTIER ENERGY INC

Thursday, May 26 2005 at 1:44 AM