aggregating project events in the wild
My silence on this blog lately is due to all the toiling I’ve been doing on my research software. I’d like to say it’s because I spent the last month doing research on the beach, but alas, Toronto has had the wettest summer in 70 years. Laptops and Speedos never look great together anyway.
Regular readers may remember that I’m designing my software to display project events to software developers in a user interface that emphasizes awareness of peer activities and changes to shared artifacts. The display integrates events of different types and unifies them in a single interface: wiki updates, mailing list messages, bug report updates, and source revisions, for example. I’ve recruited some development groups who are interested in trying out such a tool and I’m going to be deploying the it “into the wild” with them as part of my study.
The hard part of deploying a tool like this into the wild is that you need data from the wild. In my case, I need constantly updated data from multiple different systems. I’m currently working with two software development groups that altogether use six different systems on their projects, all of which I have to integrate with for my study. Eek!
At first I wasn’t sure if I could get enough data from these systems to build what I wanted. I do like getting (my hands) dirty though and I spent the last few weeks cranking out a lot of code. I’ve come up with a system that acts as a kind of project event aggregator and data store that the GUI can receive data from. Here’s a dorky architecture diagram that I slapped together:
In principle, it’s a pretty simple system. The Event Aggregator polls external data sources for new events on a regular basis, transforms them to an internal representation, and then stores them in a database. The Event Publisher renders events in XML format and GUI clients poll the publisher continuously for new events. I’ll talk a little more about each piece in more detail below, so read on if you’re so inclined.
Data Sources
A DataSource is defined as follows:
-
type: A data source is associated with a unique event type. For example, a Subversion repository has an event typeSVN_REVISION. -
url: The URL of the data source. -
username: Username required to access the URL (if any). -
password: Password required to access the URL (if any). -
pollFrequency: How often to poll the data source for updates. checkpoint: An object that represents the state of the data source relative to the Event Store. The object type varies depending on the data source type. For example, a checkpoint for a Subversion repository is the last revision number that is in the Event Store. The Event Aggregator uses the checkpoint to determine what events are “new”.
All of the data sources I’m integrating with so far are accessible over HTTP. In some cases, I’m using RSS feeds from the data source to receive updates. In other cases, I’m using remote APIs provided by the data source.
Event Aggregator/Transformer
The Event Aggregator is a daemon that continuously polls the data sources for new events. An Event is defined as follows:
-
dataSource: TheDataSourcethat generated this event. Indirectly, this defines the type of the event. -
title: The title for the event. For example, the title for an email from the project mailing list is the email’s subject line. -
author: The author of the event. For example, the author of an event of typeSVN_REVISIONis the developer who checked in the revision. -
link: Whenever possible, this field contains a link to the artifact that was modified as a result of this event. For example, for a wiki page update event, this link is a URL pointing to the wiki page. -
dateTime: A timestamp of when the event occurred. -
content: A string containing additional event content that varies depending on the type of the event. Generally, I’m trying to include as much content as possible and leave it up to the GUI to filter what isn’t needed. For example, wiki page updates contain a diff of what changed, SVN revisions contain a list of files modified, and email messages include the body of the message. Obviously, clients need to be aware of the format of this field in order to make decisions about how to render it on a display.
Each DataSource is associated with a specific transformer inside the Event Aggregator that converts each update into an Event.
Event Store
The Event Store is a database containing all known project events. Think of it as an aggregated cache of events across all the data sources.
Event Publisher/Renderer
The Event Publisher is a web service that sits on top of the Event Store. When it receives a request from a client, it pulls up events from the Event Store, renders them in XML format, and then sends them to the client.
Note that the publisher also normalizes event authors across different data sources. Just to make my life difficult, each data source generally has a different identifier for each author. For example, I might be identified by jhandcock in a Subversion repository and by jeremy@foo.bar in a bug tracking system. The publisher maintains a list of mappings from authors for each data source to a single username, which allows it to publish events generated by the same author with a single user identifier.
GUI Clients
This is the user interface that I’m building. See my previous post for some UI mockups. I still have quite a bit of work to do on the UI side, but I’m feeling good about things now that I know I’ll have the necessary data.
No comments
Jump to comment form | comments rss [?] | trackback uri [?]