Tuesday 1 December 2009

Sustaining Shuffl: the view ahead

Sustainability and related matters

This posting constitutes my reflections at the end of the Shuffl JISCRI project about matters relating to open source sustainability and near-term future development plans.

Sustainability plan updated

The sustainability plan at http://code.google.com/p/shuffl/wiki/OpenSourceSustainabilityPlan has been updated to reflect my views on the sustainability issues having reached the end of the project. This is not a fully-fledged, long-term sustainability plan, but I think it does indicate a direction of travel, and I feel this covers as much detail as is meaningful at this stage.

Due to funding of the ADMIRAL project (http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL), a level of continuing Shuffl development is expected to continue until 2011, during which time we intend to flesh out some of the functionality and focus more on the original goals of data curation.

CLA adopted

An Individual Contributor Licence Agreement, based on one approved by Oxford University for the Simal project, has been adopted for Shuffl: http://code.google.com/p/shuffl/wiki/Individual_Contributor_License_Agreement.

At some stage, we may wish to consider reviewing this if responsibility for actually publishing the software is transferred to an organization or person other than University of Oxford.

Changing requirements

During the course of this Rapid Innovation project, the requirements focus has shifted from data acquisition and curation to data visualization, in response to comments from a potential research user. This has seemed the right thing to do for a number of reasons, mainly to do with maximizing the potential for early user engagement, but it has meant that some of the original headline goals have been sidelined.

I have also not yet made direct contact with other research users, to gather their (possibly different) view of requirements. Jun Zhao and I have arranged to visit Helen White-Cooper in the very near future to resume our discussions with her about data habdling requirements and tools, at which time I plan to demonstrate Shuffl, and discuss with her the possibilities for an image annotation version, and other possible applications.

In its current state, Shuffl does appear to be a solid base for development of new functionality, with many (though not all) of the fundamental architectural features having been tested and stabilized.

Active deployment and ADMIRAL environment

As noted above, it is our intention that development the data curation aspects of Shuffl will be intensified for the ADMIRAL project, as a number of the work packages have been based on developments of Shuffl.

To this end, we will continue to work with Chris Holland, as well as other researchers, to create a deployed system including Shuffl that they will use on a day-to-day basis.

For ADMIRAL, we have the advantage that we will be creating an infrastructure service that can be enhanced with features we can use to overcome some of the usability limitations of the current browser-only version of Shuffl. For example, we will be able to provide an HTTP request proxy service so that data can be read from and written to servers other than the one from which the Shuffl application has been loaded. We can also tailor the storage services to better support file browsing for locating data files and Shuffl workspaces. We can also look to host useful data visualization and handling services,

Not YAAS (Yet Another Application Server)?

It has been noted on the Shuffl discussion list that some of the data visualization work, and the porting of a FlyWeb widget, might be seen as creating just another application server running in the browser. Such a direction could leave behind some more distinctive aspects of the original Shuffl concept, and end up reinventing facilities that are provided by other developments such as W3C widgets served by a Wookie widget server. Scott Wilson implemented an early proof-of-concept of a Wookie widget running in a Shuffl card, an idea that has so far not been followed through. One reason for this was a reluctance to get involved in implememting specialized server-side support as part of the original Shuffl project.

With the ADMIRAL project, we can review the option of using a Wookie server as part of the data sharing infrastructure to serve up standard data visualization widgets, rather than re-implementing them as Shuffl cards. We would need to understand enough about the Wookie API to hook it into the model API used by Shuffl to link information across cards. In this way, we may be able to reap the advantages of data visualization for user engagement without having to put effort into implementing them.

Monday 30 November 2009

Shuffl sprint 8 - progress review

The updated sprint plan is at http://code.google.com/p/shuffl/wiki/SprintPlan_8.

This was the final sprint, so effort was largely directed mainly towards leaving the project in a clean state for further work.

The demonstration application running from Google Code SVN has been enhanced to represent the current state of development, and the underlying framework has been significantly improved and stabilized. While the delivered functionality is less than had been intended, the project is in fairly good shape for ongoing development as part of the Shuffl project, and the basic ideas have been somewhat vindicated.

I did not make time to install a copy of the system for Chris Holland, but still hope to do this soon. We will be continuing to work with Chris in the ADMIRAL project.

See also the final progress report at http://code.google.com/p/shuffl/wiki/20091130_JISCRI_ProjectFinalProgressReport.

Shuffl final progress report

The final progress report for the Shuffl JISCRI project has been published in the project wiki, at 20091130_JISCRI_ProjectFinalProgressReport.

Monday 16 November 2009

Shuffl sprint 8 plan

The plan for sprint 8 has been posted at http://code.google.com/p/shuffl/wiki/SprintPlan_8.

This will be the final sprint to be conducted under the initial Shuffl JISCRI project umbrella, but work on Shuffl will continue as part of our ADMIRAL project.

The main goals for this sprint are:

  • to complete the back-end storage interface work started in the previous sprint
  • to improve the online prototype demonstrator for project evaluation (mainly, create a new Shuffl workspace with instructions displayed for doing the various things that Shuffl can do). Also, notes for installing eXist and Shuffl on a new system
  • install a copy of Shuffl and eXist on a machine our research user's workgroup
  • final reporting and project wrap-up, including tidying up some matters to do with project sustainability

Shuffl sprint 7 - progress review

The updated sprint plan is at http://code.google.com/p/shuffl/wiki/SprintPlan_7.

The high point of this sprint was the enthusiastic reception of the visualization interface, even though this has been completed at the expense of some of the other more curation-oriented features. If this truly helps us achieve better engagement for the ADMIRAL project, I judge this will have been a good trade-off, but some cautionary notes raised on the discussion group about maintaining the right project focus need to be borne in mind. The mechanisms for working directly with spreadsheet data went some way to reducing the pressure for some of the features not yet implemented. I have agreed with Chris Holland to install a copy of Shuffl on a system where he can use it directly with his own data, which will hopefully provide a powerful point of engagement for continuing work on ADMIRAL.

It may be worth noting that I don't feel the focus on visualization has been entirely at the expense of the original goals of Shuffl; i.e., to provide a lightweight tool for capturing and sharing annotations and data. Many of the fundamental capabilities have been demonstrated, but in different combinations: user-editable semi-structured data, card linking, and a flexible, pluggable framework for introducing new card structures. On the down side, some of the intended work on containers (e.g. stacks of cards) has not been addressed, and the card serialization format currently deployed is JSON and not RDF.

The testing framework has been extremely valuable. The full test suite now performs in excess of 2000 individual tests (though many of these are repetitious). Areas which have proved more challenging to debug have been exactly those parts of the user interaction code that are not covered by unit tests. I have resisted taking time to implement a UI test framework (e.g. based on Selenium), but rather have tried to move logic out of the user interaction code into unit-testable functions. This is a debatable strategy, but in the limited time available I didn't feel the benefits of deploying a full UI test suite would get me further forward. When I get time, I'd like to evaluate the Windmill framework (http://www.getwindmill.com/), as my past experience with Selenium has been somewhat mixed.

With work on Shuffl planned to continue as part of the ADMIRAL project, I feel my top priority is to implement as much as possible of the features desired by the actively engaged researcher - which is to improve the interface for saving and loading workspaces. Other than that, I need to continue the steps taken to promote sustainability of the outputs, including creation of a more approachable demonstration prototype. These two strands of effort would ideally come together in a back-end storage plugin that works with the Google Data API - if the opportunity presents, I'd rather like to do a mini-hackathon with someone who is familiar with the details of the Google Data API.

Wednesday 11 November 2009

"I can really see myself using this"

Big win !!!

As the Shuffl project draws to a close, I have been having some doubts about the amount of progress made, and have been asking myself whether it was the right decision to spend effort on data visualization within Shuffl (which has been time consuming). But I've just had a real boost.

During a brief demonstration to Chris Holland, I showed him his own spreadsheet data loaded and plotted as graphs in Shuffl, eliciting the response:

"I can really see myself using this"

Vindication indeed!

Chris described the ability to quickly draw up plots and add annotations without having to use four different programs as a real winner for him. Working with Chris' own data, I have seen the need for and implemented (a) selecting data blocks from within a worksheet, and (b) supporting a mixture of linear and logarithmic data plots, both of which I believe to have been contributory to Chris' response.

We have agreed that:

  • Label editing (requested previously) is not an immediate priority
  • Plot colour selection would still be nice
  • I shall focus my remaining efforts on improving the user interface for workspace saving and loading (which will require some reworking of the storage interface, but will in any case prepare the ground for continuing Shuffl work in the ADMIRAL project), and
  • I shall arrange to install a copy of eXist on a computer accessible to Chris so he can try using Shuffl in his own environment.

The main new feature that Chris requested during the demonstration was the ability to print a Shuffl workspace, or save it as an image for incorporation into a paper or document. I think this is a reasonable and do-able goal, but it won't be implemented within the current project. Maybe as part of ADMIRAL? For now, there is screen capture and printing.

There is also an interaction here with discussions I've had with Scott Wilson (Wookie Widgets) and Ross Gardler (OSS Watch), in which I have been wisely cautioned against trying to make Shuffl into a generic application server. If data visualization is to prove a draw for engaging with researchers then, looking forward to the ADMIRAL project, I think I should look seriously into the possibility of incorporating a Wookie server into the planned ADMIRAL Data Store server (LSDS). I suspect that a real win here would be if Wookie can serve a widget for displaying raw spreadsheet content (by "raw", I mean here without export to CSV format). I look forward to more interesting discussion and exploration.

Tuesday 3 November 2009

Shuffl sprint 7 plan

The plan for Sprint 7 has been posted at http://code.google.com/p/shuffl/wiki/SprintPlan_7.

The main targets for this sprint will to continue work not completed in the previous sprint:

  • Visualization of data: after getting feedback from Chris Holland, and to obtaining some representative sample data, I have a number of user interface elements to complete.
  • Improve error handling when loading/saving workspaces.
  • Improve usability of the interface for importing data and loading/saving workspaces.

Sunday 1 November 2009

Shuffl sprint 6 progress review

Progress during this sprint has been fairly poor, largely due to distraction from both personal and non-project affairs. The total amount of effort spent was 3.5 days against planned effort of 8 days. A further factor impacting progress was that reorganizing the code to enable graph label-row selection in a data table took about a day longer than expected.

On the positive side, I did hold a second review meeting with Chris Holland, and the current development is being conducted very much in response to his feedback, to make the data graphing display more useful to him.

Also on the positive side, the mechanisms for linking data between cards seem to be working nicely (e.g. when I reload new data into a table card, or change the label row, an associated graph card updates immediately; but I do need to capture this relationship when I save and restore a workspace).

Some thoughts about sprint planning: for this sprint, having fewer than 4 days actual effort expended, the sprint planning and review process seems to lack sufficient data to be meaningful. In setting sprint duration, it would seem reasonable to take account of the total amount of effort being applied rather than simply the number of elapsed days. I don't plan to change the duration of the remaining two sprints for this project, but for future projects, planning sprints with fewer than 10-15 total days of effort may be something to avoid.

Monday 19 October 2009

Shuffl sprint 6 plan

The plan for Sprint 6 has been posted at http://code.google.com/p/shuffl/wiki/SprintPlan_6.

The main targets for this sprint will be to work towards improved usability in two areas:

  • Visualization of data: after demonstrating the data graphing interface to Chris Holland, I hope to find out what he really wants, and to get some representative sample data.
  • Improve usability of the interface for importing data and loading/saving workspaces.

Sunday 18 October 2009

Shuffl sprint 5 progress review

The plan for sprint 5 has been updated with progress and review notes:

This was a sprint of highs and lows - there were significant wins, but also more distractions, and the direction of progress seems to be diverging more from the original plan. A user demonstration session was postponed, to take place early in the next sprint. I have also spent a fair amount of time cleaning up existing code in various ways.

On the positive side, a demonstrable drag-and-drop data graphing interface has been completed (as planned), and I worked with Jun Zhao to successfully implement a card containing a FlyUI gene finder widget (not planned).

I have started to look at making the data access modules more robust - currently, an I/O error (such as accessing a non-existent web resource) can leave the application broken.

Encouragingly, plans are now being laid for projects that will continue aspects of Shuffl beyond the current Rapid Innovation project.

Task completion velocity is down significantly from the previous sprint (tasks completed/effort spent: 4.5/7.4 = 0.61), which reflects the various distractions from the intended development plan.

FlyUI GeneFinder widget in Shuffl card

I recently spent a day working with my colleague Jun Zhao to successfully implement a Shuffl card containing a FlyUI (http://code.google.com/p/flyui/) gene finder widget.

This achievement is an important confidence builder, as it further demonstrates the power and flexibility of the Shuffl card plugin model: we were able to embed quite complex FlyUI widget logic unmodified into a Shuffl card. The FlyUI widgets are built using the Yahoo YUI libraries, so this success also demonstrates that Shuffl (based on jQuery) can coexist with YUI code.

The remaining challenge is to find easy ways to deploy such applications. To overcome the Browsers' "same origin" restriction on Ajax calls, the Shuffl+GeneFinder application was deployed via an Apache server configured to redirect and proxy requests to the SPARQL endpoint for FlyBase data hosted at OpenFlydata.org. In this, Shuffl is playing out quite nicely as a flexible user interface toolkit for application developers, but it is not yet clear how to deploy Shuffl as a general purpose, user-configurable data viewing and annotation tool. Maybe we should think about deploying a Shuffl application server? Could Google Apps, or similar, be recruited for this purpose?

Saturday 10 October 2009

Shuffl drag-and-drop data visualization

An initial cut of data visualization in Shuffl is working. A "data table card" card can be dropped on a "data graph card" to display graphs of the data in the table. Thanks to jQuery, the drag-and-drop interface was a breeze to code, and the model pattern discussed previously makes it very easy to link the UI functions to underlying data logic. I'm using a jQuery plugin called 'flot' (http://code.google.com/p/flot/) to do the actual graphing.

Instructions for running the demonstration are at http://code.google.com/p/shuffl/wiki/Shuffl_Demonstration.

I intend to demonstrate this to one of my research collaborators to help determine how this should be carried forward.

I've also updated the project front-page (http://code.google.com/p/shuffl/) to reflect the current state of the project, and added a screenshot and updated links to the demonstration application and Shuffl-based brief introduction.

Thursday 8 October 2009

MVC pattern and mock widgets

I've just discovered that the MVC approach I've adopted (http://shuffl-announce.blogspot.com/2009/09/framework-for-testing-shuffl-card.html) makes it really easy to create mock shuffl cards (in the sense of mock objects: http://en.wikipedia.org/wiki/Mock_object) for testing.

Here's one I used today to test the drag-and-drop drop-target logic in a data visualization card:

// Instantiate mock table card
var tc = jQuery("<div/>");
tc.model('shuffl:labels',carddatagraph_labels1);
tc.model('shuffl:series',carddatagraph_series1);

The jQuery.model plugin I mentioned previously provides all the additional logic needed to ensure the mock card object responds as required to requests for information.

Monday 5 October 2009

JISCRI reporting tags

I've decided to vary slightly the tags I use for reporting-via-blog. Updated details are noted at http://code.google.com/p/shuffl/wiki/ProjectPlanOutline_200906_200911.

In summary:

  • some progress reports are neither WINs nor FAILs, just reports.
  • and there was no tag set for ongoing planning, which I think is required for an agile project structure - goals change and new plans are created for each sprint/iteration - none of the existing proposed tag sets really fit this.

Shuffl sprint 5 plan

The plan for Sprint 5 has been posted at http://code.google.com/p/shuffl/wiki/SprintPlan_5.

Responding to research user desiderata, project activities are now diverging quite significantly from the original outline plan (other than user engagement still being a key activity at this stage).

The primary focus for this sprint will be to complete the initial data graphing display for the visualization use case, to demonstrate this to the researcher who requested this, and use that session to gather more detailed requirements for this aspect of Shuffl.

Secondary foci will be: improving aspects of the user interface, especially for workspace loading/saving, and creating some user documentation.

I also have an ongoing discussion with Ross Gardler about long-term sustainability and CLAs (contributor licence agreements) - see http://code.google.com/p/shuffl/issues/detail?id=9 for details.

Saturday 3 October 2009

Shuffl sprint 4 progress review

The plan for sprint 4 has been updated with progress and review notes:
In outline, the sprint activity was:
  • week 1: complete workspace persistence (initial cut)
  • week 2: UI for data access; expand testing framework; card refactoring to MVC pattern
  • week 3: tabular data loading and display; started on data graphing
The sprint has been characterized by good progress and technical compromises. Many of the completed tasks are not implemented completely as I would like or had originally intended, so I feel I may be accumulating some "technical debt" that will need to be addressed before the project can be truly usable by unsupported researchers and others. The user interface for saving/restoring workspaces is a case in point. See http://code.google.com/ p/shuffl/wiki/ TechnicalDebt for more details.
There has been less community engagement than planned, though some new use-cases have been noted (FlyKit, e-learning).
For the first time, I have calculated a "velocity"figure (planned effort for completed tasks/total sprint effort): 11.0/11.4 = 0.96. This figure was a surprise to me: I was expecting it to be lower on account of the unplanned time spent on admin, etc. Total sprint effort here does not include the small amount of unplanned additional time spent on other projects. Prediction of progress remains challenging with so many competing demands on limited time.

Thursday 1 October 2009

Shuffl data visualization use-case updated

I now have permission to publicly post my full notes from a meeting with Chris Holland, a researcher for whom I am targetting an application of Shuffl (and one of the supporters of our bid). Notes of our meeting are here:

Monday 28 September 2009

Framework for testing Shuffl card plugins

Card plugins are looking to be a powerful extension point for Shuffl, allowing as they do for arbitrary additional logic to be associated with a card. This extra logic may encompass data handling, user interactions and external system interactions.
This presents a challenge for testing, especially as it has proven tricky to simulate user interaction with other jQuery plugins, such as editable text values, without resorting to external testing tools like Selenium. (I have used Selenium in the past, and it has been very useful, but it does involve working with a whole new framework, and in my experience it can be unreliable when testing event-intensive browser code. I prefer to stick with the simple jQuery testing framework for unit testing, though I do expect to eventually adopt a UI testing framework as well.)
The approach I'm taking is based on what we did with the FlyWeb project: there, we adopted a pattern for widgets based on the Model-View-Controller (MVC) pattern. Essentially, all updates to the browser display and user interactions were decoupled from the main system logic through a generic "Model" facility with which code interacts using a publish/subscribe (or "Observer") pattern.
I've looked at replicating this for Shuffl, and was able to create a simple jQuery plug-in that provides a model and model-listener capability for any jQuery object, all in little more than 20 lines of code! It turns out that jQuery already has most of the logic required, between its data() method, and its event bind, unbind and trigger capabilities. I've simplified the pattern from FlyWeb by also using the model interface to also serve as a controller interface for a Shuffl card - that is, external interactions with a card are by setting, getting and listening for changes to model values.
With this in place, I have refactored the existing Shuffl card plugin code to use browser DOM accessing methods only for rendering the display, and otherwise top use the model as the primary interface for a card. The biggest changes have been that user input is recorded by updating the model, with listeners reflecting those changes to the browser DOM, and card serialization is now performed entirely from the model, where previously it involved reading values from the DOM.
This allows test code to be written that updates the model and looks for appropriate changes in other model values and/or in the browser DOM. Code that is not covered by this testing regime is reduced to those areas that respond to user inputs and generate appropriate callbacks, and generation of graphical displays; so far, such untested logic is almost entirely within off-the-shelf jQuery plugins.
Making these changes, and in particular adding new test cases for card interactions, has allowed me to discover and correct a number of bugs in the existing code. I count this a win, even though I spent 2.3 days on an activity budgeted for just half a day. The overrun is mostly caused by retrofitting the new MVC-derived pattern to existing widgets, and creating additional test cases for these.

Tuesday 22 September 2009

Accessing files from browser-based Javascript

Designing a user-friendly mechanism for accessing files from Javascript has proven somewhat elusive. My original plan was to create a temporary form in a card using an <input type="file"> element to upload the file to the AtomPub server. Unfortunately, I could find no way to control the way that data is uploaded, and I would be forced to bypass the AtomPub module, leading to problems and fragility when I implement support for different back-ends. Next, I thought to use an <input type="file"> element to allow me to browse the file system and select a file, then to some behind-the-scenes mapping from the selected file path to the corresponding AtomPub URI, so that the Javascript code can read the file via an Ajax GET request. This approach was foiled by the browsers refusal to let me access the full path name of the file: only the file name is returned from the form (in Firefox, at least). A little Googling suggests this is a browser security "feature" (though I have to say that it smells like security by obscurity). So my final solution is to simply ask the user to enter the URI of the file - I separate the base URI of the AtomPub-served file area from the rest of the filename so that a reasonable default can be provided. It's not as user-friendly as I'd like, but I'm not prepared to invest a lot of time in this at the moment, so I'm leaving it at that. I'm hoping that I may come across a better solution in due course. Fortunately, my experience with research users is that they're prepared to deal with a little clunkiness if it doesn't require too much additional thought and it helps them to get a result they want. I'm writing this up in the hope that someone else may be able to offer a solution. So far, the most likely option seems to be to upload to a temporary file area in the AtomPub server domain and return the URI thus allocated, but that would force me to make too many assumptions about the back-end service than I'm comfortable with right now.

Sunday 20 September 2009

Shuffl: basic workspace persistence using AtomPub works

I've now got Shuffl persistence working against eXist. The UI is a bit clunky (well, a lot clunky really), and the error handling is poor, but it does seem to work. I must properly document how to set it up.

Roughly:

  1. install eXist
  2. configure eXist/Jetty to serve static files from some location. (See: http://shuffl-announce.blogspot.com/2009/08/exist-and-jetty-configuration.html)
  3. copy the shuffl files to that static location (I just link to my Eclipse workspace, so short-circuit this bit)
  4. start eXist
  5. Run the demo application from file shuffl-demo.xhtml (e.g. browse to http://localhost:8080/exist/shuffl/static/demo/shuffl-demo.xhtml).
  6. Use the shuffl menu (click on the logo) "Save as new workspace...", and change the URI in the dialog to something like "http://localhost:8080/exist/atom/edit/shuffl-test/" and click OK. Note the location at the bottom of the window changes to something like "http://localhost:8080/exist/atom/edit/shuffl-test/shuffl-test.json" - any filename in the URI entered is not used.
  7. Make changes to the UI and use the menu option "Save workspace"

Later, to restore the saved data, start shuffl then use menu option "Open workspace...", and change the Atom feed path to the value used previously.

Not yet implemented:

  • saving card size
  • delete cards
  • just about anything else you want to do

It's a small WIN, but I do count this as a WIN. It was a bit of a struggle getting to this point.

Wednesday 16 September 2009

Sprint 4 plan

The plan for sprint 4 has been prepared.
This will be a 3 week / 12 day sprint.
Focus for this sprint will be (a) to complete the workspace persistence via AtomPub, and (b) to start looking at card collections and structures to support the visualization use case raised in discussion with Chris Holland. I'll probably focus on that to the exclusion of planned basic user authentication during this sprint, since the user emphasis here is on visualization rather than sharing/publication.
I have also configured this announcement blog so that postings are sent automatically the the project discussion group. (It came up in discussion with Ross Gardler that there is a tension here between JISC reporting requirements that require a syndication feed of tagged project announcements, and common open source practice of using a mailing list for project announcements and discussions. I think this approach, if it works as hoped, nicely meets these requirements, and provides a place to view announcements that us uncluttered by chit-chat.)

Initial SWOT analysis posted

An initial SWOT analysis for the Shuffl project has been posted here.
It is far from complete, and I intend to use this as a place to collect additional notes as I go along.

Tuesday 15 September 2009

Sprint 3 review

Sprint 3 is finished, and progress notes can be viewed here.

The main points to note are that I've had to re-assess my plan for implementing workspace persistence, and have backed up to implement a new layer of test cases. Progress has been made, but workspace saving/persistence is still not yet complete.

The other main feature of the sprint has been lots of meetings: the JISCRI meeting, the London linked web data, and a VoCamp in Bristol. All of these have contributed, in various ways, to my thoughts for taking Shuffl to users, and have provided opportunities to discuss ideas and use-cases.

I've tagged this progress note as a FAIL as the project is slipping compared with plan: AtomPub has proved rather more of a handful to master than expected. Progress is being made, but more slowly than hoped. I'm hoping to be able to get back to user-visible functionality before too much longer.

Wednesday 9 September 2009

DELETE on AtomPub media resource

I read somewhere that deleting an AtomPub media resource associated with a feed item (using HTTP DELETE) should also delete the feed item. It turns out that the eXist AtomPub implementation does not support this, claiming (erroneously) that the resource does not exist.
Examining the AtomPub spec provides little support (though on first reading I actually thought this did support what I want to do):
To delete a Member Resource, a client sends a 
DELETE request to its Member URI, as specified 
in [RFC2616].  The deletion of a Media Link
Entry SHOULD result in the deletion of the 
corresponding Media Resource.
-- RFC 5023, section 9.4
This is unfortunate for Shuffl, as I had been hoping to not get too bogged down in feed items vs resources. Now I might need to include a feed item URI along with card data if I'm to be able to delete the card data. Or maybe I'll just delete the feed and recreate it without the missing data?
More generally, I am beginning to wonder if AtomPub was the best choice for persisting Shuffl data. I estimate the complexities of dealing with AtomPub have cost me a week or so in slippage, and that I would be better off with WebDAV. I'll stick with it for now, but I'm thinking about a plug-in framework for Shuffl back-end support so I can support alternative protocols.

Wednesday 2 September 2009

Relearning the agile lessons

My coding has been getting a bit bogged down recently, and I've finally stepped back a little to try and figure out why.
One of the lessons of agile development is that things go more smoothly if one proceeds by small steps. I realize I've been trying to implement too much at once, and as a result the code has been getting away from me.
So how did this come about? In-browser unit testing can take quite a bit of organization to put in place, and I've been so flushed by the ease of throwing stuff together with jQuery that I've neglected to create sufficient unit tests, and instead relying upon manual operation testing of the user interface. For the highly-visible user interface functions, and proceeding in small steps, I was getting away with this. But my current task is to save a shuffl workspace using AtomPub. The AtomPub handlers I've developed do have fairly good unit test coverage, but the additional logic needed to assemble the workspace to be saved is also rather complex, and it is here that I'm getting bogged down.
So I'm suspending that line of development, and reorganizing the existing code into modules around which I can put in place some decent unit tests. Strangely, I now feel that things are once again moving forward at a respectable pace! This is all work that needs to be done sooner or later, and sooner seems best.

jQuery with Firefox gotcha: use .html, not .xhtml

I've been tearing hair out over this error message from Firefox when trying to run my test cases from the local file system.
uncaught exception: [Exception... "Component returned failure code: 0x80004003 (NS_ERROR_INVALID_POINTER) [nsIDOMNSHTMLElement.innerHTML]" nsresult: "0x80004003 (NS_ERROR_INVALID_POINTER)" location: "JS frame :: file:///Users/graham/workspace/googlecode_shuffl/jQuery/js/jquery-1.3.2.js :: anonymous :: line 911" data: no]
Line 0
According to a random page I found on the web, it turns out to be caused by a subtle bug in Firefox and XHTML parsing. The file's DTD declaration calls for transitional, loose XHTML. The file was being served with a .xhtml extension. Changing the file extension to just .html solves the problem.

Friday 28 August 2009

AtomPub with jQuery implementation done

WIN and FAIL: the AtomPub protocol handler has been developed and tested, but has taken more than twice the effort originally estimated. The failure here is, I think more a failure of estimation than indicative of a technical problem. For details, see http://shuffl-announce.blogspot.com/2009/08/implementing-atompub.html.

Implementing AtomPub

WIN and FAIL: the protocol handler has been developed and tested, but has taken more than twice the effort originally estimated. The failure here is, I think more a failure of estimation than indicative of a technical problem.
Factors involved in the development of the AtomPub handler include: learning the relevant details of the AtomPub protocol; learning how to use the jQuery Ajax API; dealing with idiosyncracies of the eXist implementation of AtomPub; developing a framework for composing functions that return results via asynchronous callbacks.
The failure of estimation comes from not appreciating the range of supporting activities needed to implement the AtomPub handlers. Had the activities been identified when performing the estimates, I feel the estimates would have been closer to the final effort actually used. But there is a tension here: too fine a breakdown when estimating tasks leads to a plan that bears little resemblence to the actual activities for which progress is reported. Maybe finer-grained task breakdowns could be used for the purpose of estimation, but disregarded for the purposes of progress reporting. Or maybe the real problem here is simply doing task estimation without the benefit of group discussion with other team members? Maybe also helpful would be an outline of test cases used to drive the test-led development process.
Some specific technical issues faced down include:
  • checking the formal specification for relevant details of the AtomPub protocol (0.5 day)
  • learning how to use the jQuery Ajax API (0.5 day)
  • dealing with idiosyncracies of the eXist implementation of AtomPub (about a days worth of trial-end error activity added to the main development task)
  • developing and testing a framework for composing functions that return results via asynchronous callbacks (about 1 day, not recorded in the progress summary as it was performed in odd fragments of time during a vacation period)
  • coding the test cases and handler (about 2 days, which was pretty much the original estimate)
  • tracking down an obscure bug caused by erroneous multiple invocations of an asynchronous callback method (about 0.5 day)
So, all-in-all, the task took about 5 days, rather than the two estimated. A large part of this overrun can be attributed to learning details of technologies not used previously.
The AtomPub specification is bit vague in some areas about the URIs used for accessing and updating feeds and feed items. In particular, it gives no clear indication (by design, I think) of how feed URIs are constructed. I further feel that the eXist implementation of AtomPub may diverge in some respects from the spirit if not the letter of the AtomPub specification (and hence other implementations), by virtue of the way it uses a number of different base URIs for accessing different AtomPub-related functions. Also, the eXist implementation I use does not seem to implement service documents that allow discovery of atom feeds - this being at odds with eXist's own documention as well as the intent of the AtomPub spec; this will make it harder to implement a pure browsing interface for loading and saving shuffl workspaces. The AtomPub test cases have been developed to reflect the way that eXist works, and may need to be adjusted when refining the AtomPub handler to work with different server implementations.
Another tricky issue was that when creating an AtomPub "Media Resource" a resource title cannot be specified, so it must be applied by updating the Atom Entry returned by the initial create operation, which adds some complexity to the asynchronous completion logic in the handler.

Wednesday 19 August 2009

eXist and Jetty configuration

My initial experiments to get eXist installed and running as an AtomPub server went very smoothly.
But I've just spent a day thrashing around with the server configuration, trying to get it to serve static files from the Shuffl project test directory so that I can run tests more easily, without falling foul of the Javascript "same origin" restriction. This has been complicated by a number of factors:
  • Jetty is natively configured by Java object dependency injection, for which XML configuration files are a shim interface
  • the Jetty documentation isn't very approachable, especially as the XML configuration sections don't actually tell you how to configure the server - for that, you have to dig into the various servlet classes to understand the values that need to be injected
  • some configuration options can alternatively be applied through the servlet container configuration (web.xml), which is a completely different format
  • the eXist installation runs a specially tailored configuration of Jetty that doesn't immediately make it easy to find the configuration options
  • default security settings in Jetty do not permit following symbolic links when serving static files
Most of these are not necessarily bad things, but in combination they create a system whose configuration is about as user-friendly as a cornered rat. For me, the breakthrough came when I found a line in the eXist documentation "The Jetty configuration can be found in tools/jetty/etc/jetty.xml" (http://exist-db.org/deployment.html, section 3). This is relative to the eXist installation directory.
Editing this file, I can change (almost the last line in the file):
<Set class="org.mortbay.util.FileResource" name="checkAliases" type="boolean">true</Set>
to
<Set class="org.mortbay.util.FileResource" name="checkAliases" type="boolean">false</Set>
Now I can create a symbolic link shuffl in ${EXIST_HOME}/webapp, linking to my shuffl project directory, and by pointing my browser at http://localhost:8080/exist/shuffl/ I can browse my project directories and run the test files via eXist. Phew!

Sunday 16 August 2009

Blogs, research data and preservation

Skimming through an aggregation of JISCRI posts circulated by David Flanders, I noticed this mention of ArchivePress, which seems to be relevant to some of the goals of the research group in which Shuffl is being developed. Our interests are re-use and preservation of research data, most of which does not make its way into archival journals and is lost when the original researcher "moves on". ArchivePress is also about preservation of useful knowledge that doesn't make it into archival journals, and I'm thinking the ideas may be also applicable to data. Shuffl is part of an activity that attempts to make it easier to capture and share highly heterogeneous data from small research teams, but does not of itself address preservation. Can the acquisition of research data benefit from the journal pattern that underpins the operation of blogs? And as such, can data preservation build upon projects like ArchivePress? Factors in favour:
  • Shuffl is already being designed to use Atom (via AtomPub), a format with its roots in representing blogs
  • Research data is typically captured over a period of time
  • The card metaphor used by Shuffl operates at a a level of granularity that is arguably comparable with a blog post
Factors (maybe) against:
  • Shuffl is intended to allow progressive refinement of structure in data, both within and between data held on different cards - it is not clear now these refinements would be captured and navigated in a journal-like framework
  • AchivePress seems to be WordPress-specific - I don't know if this is a problem
I think I need to be more sensitive to developments in the area of "data blogs" - I just tried to Google for that, and didn't immediately see anything very enlightening. Maybe the closest thing I've come across personally is http://timetric.com/, which was discussed at a recent Oxford Geek Nights session. Maybe myExperiment and related work has something to offer, though it appears to be very workflow-oriented? I'm sure there's more.

Sunday 2 August 2009

JISCRI lightweight reporting? ... and the three bears!

It just struck me that the "lightweight" reporting structure for JISCRI projects - that is, by blog posts such as this - is actually achieving far more detailed progress reports than might be obtained through a more formal top-down reporting structure. And they might even contain more useful information! Go figure.

jQuery rocks!

I've been most impressed by the way jQuery has simplified coding for the Shuffl user interface.
I've been trying to analyze why this is, and I have two (partial) answers:
  1. jQuery implements a kind of publish/subscribe architecture: a function or plugin is a kind of published service, and a jQuery selector is a kind of subscription to that service. The advantage of a publish/subscribe architecture is that implementations of of the service or function provided are very highly decoupled from implementations of the consumer of that service, which makes for highly modular and loosely coupled code.
  2. jQuery makes the overall code very modular. It is remarkably easy to add a jQuery plugin to an application, and just use it at the point it is needed. I think this is in part due to the publish/subscribe pattern noted above, but also that there's more to it that that, but I can't quite put my finder on what that is. Maybe just inspired design!
I highly recommend jQuery for browser based rich web applications - I've never previously known Javascript programming to be so easy.

Sprint 1 complete: WIN (mostly)

The first sprint for the Shuffl project is complete, and most of the aims have been achieved. An initial user interface has been build using jQuery, and is working very well, as far as it goes. Work has even started on code for persistence ahead of schedule, as this was originally bartered out of scope for this first sprint. This all despite losing a couple of days to another project. Work not done that should have ben done: as yet, there are no automated tests, but the exploratory nature of the early work has made it hard to be rigorous about this; initial meetings with users have not yet been set up. A full summary of the sprint plan and progress can be seen here (http://code.google.com/p/shuffl/wiki/SprintPlan_1).

Monday 27 July 2009

Can Pen and Paper Help Make Electronic Medical Records Better?

I spotted this item in ACM TechNews. I thought it interesting because it resonates quite strongly with some of the ideas that led to the formulation of Shuffl.


Can Pen and Paper Help Make Electronic Medical Records Better?

IUPUI News Center (07/20/09) Aisen, Cindy Fox Using pen and paper occasionally can make electronic medical records even more useful to healthcare providers and patients, concludes a new study published in the International Journal of Medical Informatics. The study, "Exploring the Persistence of Paper with the Electronic Health Record," was led by Jason Saleem, a professor in the Purdue School of Engineering and Technology at Indiana University-Purdue University Indianapolis. "Not all uses of paper are bad and some may give us ideas on how to improve the interface between the healthcare provider and the electronic record," Saleem says. In the study of 20 healthcare workers, the researchers found 125 instances of paper use, which were divided into 11 categories. The most common reasons for using paper workarounds were efficiency and ease of use, followed by paper's capabilities as a memory aid and its ability to alert others to new or important information. For example, a good use of paper was the issuing of pink index cards to newly arrived patients at a clinic who had high blood pressure. The information was entered into patients' electronic medical records, but the pink cards allowed physicians to quickly identify a patient's blood pressure status. Noting that electronic systems can alert clinicians reliably and consistently, the study recommends that designers of these systems consider reducing the overall number of alerts so healthcare workers do not ignore them due to information overload.

Sunday 26 July 2009

Agile modelling and development

It has often bothered me a little that the early stages of a project seem to fly in the face of the "normal" agile mantra that all effort should be directed to satisfying a user story. But in early project stages, I find myself spending time on high-level design activities activities, and creating technical infrastructure, that of themselves do not directly contribute to a user story. I find myself wondering if that's because I'm too technically motivated to be a really effective agile developer. Recently, a colleague pointed me to an essay, Agile Model Driven Development (AMDD) [1], from which I take some considerable comfort. What is recognized here is that even for agile development, it can be appropriate to take some time (but not too much!) to perform "requirements envisioning" and "initial architecture envisioning", to "get a good gut feel what the project is all about" and to "identify an architecture that has a good chance of working". As always with agile development, the goal is "to get something that is just barely good enough so that your team can get going". This essay also says "For your architecture a whiteboard sketch overviewing how the system will be built end-to-end is good enough" - it seems to me that the important thing here, to be emphasized, is end-to-end (or: front-to-back). That is, to sketch a system that connects all the way from a front-end user to a back-end storage or service. Without really planning it, this approach is broadly what I've been doing in the early stages of Shuffl. In the light of this article, time I have spent, e.g., evaluating back-ends, even though the first iteration does not call for any persistence, does seem to be appropriate. As I near the end of the first iteration, I think I can say that having a view of what the back end may look like does help me to make decisions about how to structure aspects of the user interface code. An area where I've found my own practice of agile development has fallen down is poor estimation of effort and tasks. Getting the balance of granularity right for estimating is, I think, critical: too coarse a granularity and key elements are overlooked; too fine and the plans made don't reflect the actual development process when code has to be cut. Part of the problem here may be working alone, rather than part of a team, so that valuable discussion and review elements of the process are absent. Recommended reading! [1] Agile Model Driven Development (AMDD), by Scott Ambler. http://www.agilemodeling.com/essays/amdd.htm

Thursday 9 July 2009

Shuffl back-end selected

The initial back-end software for Shuffl has been selected and checked out. I'm planning to use AtomPub for the back-end protocol, and eXist as the back-end database. Rationale for the choice can be seen at http://code.google.com/p/shuffl/wiki/BackendSystemsSurvey. I've installed eXist (piece of cake!) and run a series of tests against the out-of-box installation to demonstrate the identified capabilities are all present. See http://code.google.com/p/shuffl/source/browse/#svn/trunk/spike.

Monday 6 July 2009

Shuffl project lift off!

Shuffl takes off...
The project proposal, project plan, effort schedule, first sprint plan, and an initial draft of an open source sustainability are now online.