Sunday, November 16, 2008

A new contact sport?

One of the problems various projects have is maintaining the storage of data of various types into the future. Be it online video, documents or anything else.

Depending on what you're doing, you might be dealing with documents released under Freedom of Information laws, or other material (that link is well worth reading for some of the more different arguments on various issues).

Video or other publications that may be subjected to take down requests backed with legal enforcement, or even just trivial requests .

While those requests will be honoured by organisations, that only matters if there aren't copies. It also really helpfully highlights the really interesting stuff - because someone cares enough for it to be taken down. Such requests usually have legal standing, and there's sometimes a good reason why (whether you agree with that reason is a slightly different matter).

With an XML feed of all documents published (could be RSS, but this is all documents), it would be simple matter to look for documents that disappear or change, and then appropriate action can be taken on copies. The same principal can apply to video on youtube, or anything anywhere else. Who'd write a greasemonkey script or firefox plugin to download a copy of all youtube you watch to your PC? Or keep a copy of anything that come through particular feeds on your RSS reader on youtube. You don't need to worry initially about bandwidth or storage (1Tb hard drives aren't that expensive, and you're not uploading anything yet)

While IndyMedia are finding that people will use the best tools available, irrespective of ideology, even using those tools people who don't want things to be able to disappear can easily prevent that happening. If something looks for new interesting material as it's published, and makes copies, that's enough to make sure that there are copies available if there is anyone interested in that happening. More importantly, this can be done in a complete automated and fully decentralised fashion, where it is not possible to know who has a copy of the data.

On TheGovernmentSays.com we cache everything we ever see as, some day, that might be useful. While the bulk of the material is not likely to be interesting, any items that someone doesn't want to be seen are a nice juicy take down target accompanied by a legally binding assertion that it's true.

If crowdsourcing is learning from the wisdom of crowds, is lawyersourcing taking cues from the actions of lawyers?

2 Comments:

Anonymous David Pollard said...

It would be really handy to be able to keep a record of all the sites one had visited, especially if indexed notes and tags could be added. This would save a deal of effort when doing research and avoid those tedious searches e.g. to re-discover 'that site two months ago about xyz that I didn't bother to bookmark'.

Are there any Open Source firewall projects? If the programmers could be persuaded to add capture logs and tagging this would be great software. A local search engine could easily dig out those lost nuggets, and it shouldn't be too difficult to arrange for browsing of the history.

30/12/08 03:45  
Blogger Sam Smith said...

Safari on the mac has (default) options to do this, and integrated with spotlight, they're really useful. However, you also need a "private browsing mode" (aka porn mode) to make this actually useful, rather than just potentially useful.

Putting the caching into a firewall would require deep packet inspection; which is half of what allowed Phorm to do what it did to cause people to go bananas. But it's probably possible in the opensource firewalls if you know what you're looking for and how.

If you run an RSS aggregator locally, it's already easy to tell it to cache everything forever. The issue comes with stuff that only lives in the cloud. Which can be dealt with using something like greasemonkey

30/12/08 14:03  

Post a Comment

Links to this post:

Create a Link

<< Home