Comments on threats to DirectionlessGov

William Heath recently talked about what would happen if Google killed Directionlessgov (here).
While I doubt they would – we don’t make money, just a mockery. That said, there are threats to Directionless; ones which are far more deadly than Google could be.

The part of Google we use is their search engine, and when I talk about Google here, that’s the only bit I mean – www.google.com/search . Search is big business – Google (GOOG) alone is worth about $112 billion. Even so, if Google suddenly didn’t exist for any reason, we would just use a different search engine. It may suck more, but we’d be OK. The reason for them existing no longer may be that they don’t want us to use their site, maybe they make changes which are incompatible with what we do, or
maybe the ISS crashed into them
. Search is a commodity. As with many of the sites on the internet, if you can’t use them for purpose, whether they exist is irrelevant.

Much of what Directionless, and other sites such as http://www.TheyWorkForYou.com, http://www.PublicWhip.org, and overseas equivalents such as http://www.GovTrack.US, do, is scraping. We read the html page that Parliament (or wherever) serves out for browsers, and interpret it and pull out the content using automated programs. Those programs are, and have to be, quite conservative in what they understand – as accurate representation is critical. When those page structures change, our sites break until we catch up. While change is annoying, progress is a good thing (“progress” backwards, however, is just annoying).

There is no free, reliable, source of the postcode to Local Authority (or Constituency) lookup table – it costs a very large amount of money. With a budget of 3 tea bags, some milk and a chocolate biscuit, Local Directionless needs to know which Authority covers your postcode so that we can search the relevant website. We scrape this information from another site. However, when the site we scrape changes, stops working or moves block us, then we stop working until we fix it. While we can adapt, and are be on the defensive in an arms war and will only win because of more motivation and supplies of tea.

The biggest threat to directionless, and all similar civic sites, is not that one commodity part stops working until we switch (Google to Yahoo for example), it’s that one vital part prevents us from working. For some projects, it’s mapping data (unavailable, rather than restricted), for some, it’s the license of Hansard and potential problems caused by legal threats, in others, it’s what happens when a partially open site takes offline part of the site we use, and where we need up to date lookups.

What took most of the time of the local search was not the implementation, but creating the lookup table between Local Authorities/Councils (and how they were titled: “Cambridge” is not the same as “Cambridge City” or “Cambridge County”). That’s the real core of Directionless Local. And there’s no easy way for us to get that list from any other public site.

That lookup isn’t commoditised. Yet. We could protect it and commoditise it by creating a list of postcodes, one post code per authority and constituency (and possibly even ward?), and make it publicly available. There are many websites which allow you to do a lookup, and definitive lookup tables could be created, by anyone, from any of them. But only if there is a master list which we know is comprehensive. Then Directionless, and any other site, could swap in any site we wanted to get the information from. Some of these jobs aren’t sexy, but need doing.

Why does the local search currently not work in places? Because the site we scrape (which we were able to get a definitive list of authority names from) has taken down some authorities while they correct the data they show. Unfortunately, until they do, we stay broken in those areas. And there’s not a thing we can do about it, and we’re stuck until they fix it.

That’s the biggest threat to www.Directionlessgov.com – well meaning, justifiable, simple, and deadly.

posted: 22 May 2006

Direct Gov – but directly to where?

[Note: most of this was originally written in summer 2005, with some minor tweaks and published in February 2006]

Work done by Government is generally one of 3 things; doing
work people don’t see, doing work people see, or moving
something from one category to the other. Search engines and
portal tools are often used to help citizens do what they
can’t find. Vast quantities of money, goodwill and time are
wasted because of poor re-implementations of a moderately
hard problem.

In a direct comparison early in 2005, 75% of users, when
shown results of search queries side by side, selected a
result from Google (limited to .gov.uk domains) over the
specialised £4.4m portal direct.gov.uk[2].

Users submitted a term to a webpage, and were shown, side by
side, the results of their search in the direct.gov.uk
engine and in google.

When the term was submitted, a request was sent to the
direct.gov.uk search engine, and the resulting HTML parsed
and the results extracted and displayed. Google’s knowledge
of .gov.uk sites was then searched[1] and similarly displayed.

The results were shown in the order that the search engine returned them,
and the default number of results for each engine shown side
by side (20 results for direct.gov.uk and 10 for Google). When a
user clicked on a page they wanted to look at, they
were first taken to a script which recorded their search term,
preferred url and which search engine returned that link, before bouncing
them to where they wanted to go. Users who opened multiple links for the
same search term (for whatever reason) generated multiple records.

Initially, we displayed google results in the left, and
Direct.gov.uk results on the right. After a month of usage,
these were switched to look into whether the earlier display
of google (on the left) was disadvantaging direct.gov.uk.
When the direct.gov.uk results were shown first, 28% of
people a small (3%) decrease in the percentage who clicked a
google result when it was shown second rather than first.

People generally searched using “keywords”, rather than a
sentance or phrase. Where longer search terms were submitted
(ie four words or more), these were generally for very
specific searches (e.g. “national spatial address
infrastructure”, or “freedom of information request
exemption”) , rather than english phrasing around what was
wanted (e.g. “how much council tax is band e in selsey”).

Direct.gov.uk “improved” searching

In the summer of 2005, partially as a result of criticism,
direct.gov.uk loudly announced a new, improved search engine
in August. Taking only results from September 2005 onwards,
repeating the above comparison, the newer engine made
no difference – the direct.gov.uk was still selected 28% of
the time (we still showed direct.gov.uk results first).

Footnote

1. used the Google API which returns identical results to www.google.com but in a more computer readable format

2. These stats were generated in a 3 month period in 2005.

posted: 12 Apr 2006