Making Government statistics more accessible

There are often conversations about what Government organisations could do to make their data more available. This data is already made available in multiple forms; most websites have a version covering the latest fad when it was being designed (the varying “mobile” access has been a constant for the last few years, although the actual methods have varied – think WAP).

What would happen if a site was designed to be reused, as well as used?

It would probably involve outputting XML or CSV or similar, rather than HTML, but the content may be substantially similar. Other projects go from HTML to XML…

One project I’ve recently been working on involves reusing data from Neighbourhood Statistics.

Neighbourhood statistics has many constituent parts, but it’s main aim is for you to give it an area, and it tell you statistics about that area. It can do that in multiple different ways, as data, as descriptions and with pretty maps, but the core is the data. All easily accessible in nice friendly ways. In fact, NeSS is a good example of what most sites should do about making data accessible to people.

Although for my purposes, I didn’t want to type 140,000 postcodes into NeSS. I’m lazy and that type of work is what we should get the computer to do. The data is well formed – we can just scrape it.

While on a singular basis, that worked (results will be available soon), what about something for next time?

NeSSY is designed to make it easier to make repeated requests for specific information (or find out what is available for an area) from Neighbourhood Statistics. It operates the same way as the NeSS web interface does, but you can tell it which links you’d like it to follow to get the data you want.

Retrofitting this on has significant downsides, as significant amounts of context are lost in the conversion from the database into HTML, and putting it back requires both vast amounts of care (getting it wrong is bad), and more knowledge/time, I’ve given it so far (about 8 hours of work). There may be areas that it doesn’t cope with at all.

What it does show is that there are relatively simple and fast ways of redisplaying information that Government could add to their systems which make reuse far easier.

Multiple choice games could be “fun” to see how much people know about their local areas… The possibilities of data reuse are endless. If anyone is interested in working on further development, the code is here, and please drop me an email to let me know what you’re working on

posted: 06 May 2007

Extensions to CommentOnThis.com

Since I launched www.commentonthis.com in December, there have been a number of people who’ve asked about future expansions. Here are some initial thoughts on the directions it wont go, but which others may wish to take similar sites which I’d .

CoT does what it does – lets you comment on specific paragraphs of selected PDFs which we’ve put up. There are no immediate plans to expand it to commentsonthis.com to arbitrary documents. However, the process by which they are converted is pretty much document independent.

You take a PDF document, and run it through a text converter. Then it gets edited to clean up page footers/headers and other bits, and add in markup for section headers. Then it gets processed again which puts it into the database for comment. While the process is a pain in the neck for most documents, it’s not particularly complicated.

While, currently, the whole thing is done through scripts and editing is done via a text editor, there’s no reason it can’t be done through scripts and a wiki, hence be done in the browser.

Which means, that we can easily get to a point where people can publish arbitrary documents, and have them be commentable and linkable at a relatively fine granularity. There would need to be various proceedures put in to avoid spam, but the addition of some security and hooks for anonymity (https to hide what you’re looking at, and serverside hooks to avoid some traffic analysis attacks, plus whatever you need to do to be tor, or anything else, friendly).

For various reasons, you often do not want the original documents to be fully wikied, but want them to be commentable and linkable down to the paragraph level.

However, if you then automate the process, such that you can put up and start cross linking a set of documents and running tasks over them.

As another possible application, do we know any projects which will start getting documents in which people deem to have value and which may be usefully commented on and linked to? Yes, we do.

Does anyone want to run with this idea? I’ve not got the time to build it, but I’m happy to help.

posted: 15 Apr 2007