Friday 28 August 2009

AtomPub with jQuery implementation done

WIN and FAIL: the AtomPub protocol handler has been developed and tested, but has taken more than twice the effort originally estimated. The failure here is, I think more a failure of estimation than indicative of a technical problem. For details, see http://shuffl-announce.blogspot.com/2009/08/implementing-atompub.html.

Implementing AtomPub

WIN and FAIL: the protocol handler has been developed and tested, but has taken more than twice the effort originally estimated. The failure here is, I think more a failure of estimation than indicative of a technical problem.
Factors involved in the development of the AtomPub handler include: learning the relevant details of the AtomPub protocol; learning how to use the jQuery Ajax API; dealing with idiosyncracies of the eXist implementation of AtomPub; developing a framework for composing functions that return results via asynchronous callbacks.
The failure of estimation comes from not appreciating the range of supporting activities needed to implement the AtomPub handlers. Had the activities been identified when performing the estimates, I feel the estimates would have been closer to the final effort actually used. But there is a tension here: too fine a breakdown when estimating tasks leads to a plan that bears little resemblence to the actual activities for which progress is reported. Maybe finer-grained task breakdowns could be used for the purpose of estimation, but disregarded for the purposes of progress reporting. Or maybe the real problem here is simply doing task estimation without the benefit of group discussion with other team members? Maybe also helpful would be an outline of test cases used to drive the test-led development process.
Some specific technical issues faced down include:
  • checking the formal specification for relevant details of the AtomPub protocol (0.5 day)
  • learning how to use the jQuery Ajax API (0.5 day)
  • dealing with idiosyncracies of the eXist implementation of AtomPub (about a days worth of trial-end error activity added to the main development task)
  • developing and testing a framework for composing functions that return results via asynchronous callbacks (about 1 day, not recorded in the progress summary as it was performed in odd fragments of time during a vacation period)
  • coding the test cases and handler (about 2 days, which was pretty much the original estimate)
  • tracking down an obscure bug caused by erroneous multiple invocations of an asynchronous callback method (about 0.5 day)
So, all-in-all, the task took about 5 days, rather than the two estimated. A large part of this overrun can be attributed to learning details of technologies not used previously.
The AtomPub specification is bit vague in some areas about the URIs used for accessing and updating feeds and feed items. In particular, it gives no clear indication (by design, I think) of how feed URIs are constructed. I further feel that the eXist implementation of AtomPub may diverge in some respects from the spirit if not the letter of the AtomPub specification (and hence other implementations), by virtue of the way it uses a number of different base URIs for accessing different AtomPub-related functions. Also, the eXist implementation I use does not seem to implement service documents that allow discovery of atom feeds - this being at odds with eXist's own documention as well as the intent of the AtomPub spec; this will make it harder to implement a pure browsing interface for loading and saving shuffl workspaces. The AtomPub test cases have been developed to reflect the way that eXist works, and may need to be adjusted when refining the AtomPub handler to work with different server implementations.
Another tricky issue was that when creating an AtomPub "Media Resource" a resource title cannot be specified, so it must be applied by updating the Atom Entry returned by the initial create operation, which adds some complexity to the asynchronous completion logic in the handler.

Wednesday 19 August 2009

eXist and Jetty configuration

My initial experiments to get eXist installed and running as an AtomPub server went very smoothly.
But I've just spent a day thrashing around with the server configuration, trying to get it to serve static files from the Shuffl project test directory so that I can run tests more easily, without falling foul of the Javascript "same origin" restriction. This has been complicated by a number of factors:
  • Jetty is natively configured by Java object dependency injection, for which XML configuration files are a shim interface
  • the Jetty documentation isn't very approachable, especially as the XML configuration sections don't actually tell you how to configure the server - for that, you have to dig into the various servlet classes to understand the values that need to be injected
  • some configuration options can alternatively be applied through the servlet container configuration (web.xml), which is a completely different format
  • the eXist installation runs a specially tailored configuration of Jetty that doesn't immediately make it easy to find the configuration options
  • default security settings in Jetty do not permit following symbolic links when serving static files
Most of these are not necessarily bad things, but in combination they create a system whose configuration is about as user-friendly as a cornered rat. For me, the breakthrough came when I found a line in the eXist documentation "The Jetty configuration can be found in tools/jetty/etc/jetty.xml" (http://exist-db.org/deployment.html, section 3). This is relative to the eXist installation directory.
Editing this file, I can change (almost the last line in the file):
<Set class="org.mortbay.util.FileResource" name="checkAliases" type="boolean">true</Set>
to
<Set class="org.mortbay.util.FileResource" name="checkAliases" type="boolean">false</Set>
Now I can create a symbolic link shuffl in ${EXIST_HOME}/webapp, linking to my shuffl project directory, and by pointing my browser at http://localhost:8080/exist/shuffl/ I can browse my project directories and run the test files via eXist. Phew!

Sunday 16 August 2009

Blogs, research data and preservation

Skimming through an aggregation of JISCRI posts circulated by David Flanders, I noticed this mention of ArchivePress, which seems to be relevant to some of the goals of the research group in which Shuffl is being developed. Our interests are re-use and preservation of research data, most of which does not make its way into archival journals and is lost when the original researcher "moves on". ArchivePress is also about preservation of useful knowledge that doesn't make it into archival journals, and I'm thinking the ideas may be also applicable to data. Shuffl is part of an activity that attempts to make it easier to capture and share highly heterogeneous data from small research teams, but does not of itself address preservation. Can the acquisition of research data benefit from the journal pattern that underpins the operation of blogs? And as such, can data preservation build upon projects like ArchivePress? Factors in favour:
  • Shuffl is already being designed to use Atom (via AtomPub), a format with its roots in representing blogs
  • Research data is typically captured over a period of time
  • The card metaphor used by Shuffl operates at a a level of granularity that is arguably comparable with a blog post
Factors (maybe) against:
  • Shuffl is intended to allow progressive refinement of structure in data, both within and between data held on different cards - it is not clear now these refinements would be captured and navigated in a journal-like framework
  • AchivePress seems to be WordPress-specific - I don't know if this is a problem
I think I need to be more sensitive to developments in the area of "data blogs" - I just tried to Google for that, and didn't immediately see anything very enlightening. Maybe the closest thing I've come across personally is http://timetric.com/, which was discussed at a recent Oxford Geek Nights session. Maybe myExperiment and related work has something to offer, though it appears to be very workflow-oriented? I'm sure there's more.

Sunday 2 August 2009

JISCRI lightweight reporting? ... and the three bears!

It just struck me that the "lightweight" reporting structure for JISCRI projects - that is, by blog posts such as this - is actually achieving far more detailed progress reports than might be obtained through a more formal top-down reporting structure. And they might even contain more useful information! Go figure.

jQuery rocks!

I've been most impressed by the way jQuery has simplified coding for the Shuffl user interface.
I've been trying to analyze why this is, and I have two (partial) answers:
  1. jQuery implements a kind of publish/subscribe architecture: a function or plugin is a kind of published service, and a jQuery selector is a kind of subscription to that service. The advantage of a publish/subscribe architecture is that implementations of of the service or function provided are very highly decoupled from implementations of the consumer of that service, which makes for highly modular and loosely coupled code.
  2. jQuery makes the overall code very modular. It is remarkably easy to add a jQuery plugin to an application, and just use it at the point it is needed. I think this is in part due to the publish/subscribe pattern noted above, but also that there's more to it that that, but I can't quite put my finder on what that is. Maybe just inspired design!
I highly recommend jQuery for browser based rich web applications - I've never previously known Javascript programming to be so easy.

Sprint 1 complete: WIN (mostly)

The first sprint for the Shuffl project is complete, and most of the aims have been achieved. An initial user interface has been build using jQuery, and is working very well, as far as it goes. Work has even started on code for persistence ahead of schedule, as this was originally bartered out of scope for this first sprint. This all despite losing a couple of days to another project. Work not done that should have ben done: as yet, there are no automated tests, but the exploratory nature of the early work has made it hard to be rigorous about this; initial meetings with users have not yet been set up. A full summary of the sprint plan and progress can be seen here (http://code.google.com/p/shuffl/wiki/SprintPlan_1).