There are two consultations on data currently running, one on Open Data, and one on Public Data.
One size will not fit all, and those tensions must be managed.
Open Data, according to the Cabinet Office, is data which fully meets the Open Data Definition.
Public Data, according to the Department of Business, Innovation and Skills, is data collected as part of a Public Task, or, more generally, data collected by a Public Body.
Those two are not necessarily in conflict; but they are not the same thing. Mostly, here, I ignore the Open Data bits — we all know what that looks like. Free and unrestricted, OGL via data.gov.uk, like the PM has said.
The two consultations, the split between BIS and Cabinet Office, and many of the tensions, come down fundamentally or partially to the irreconcilable tension between Transparency and Privacy.
The Open Data advocates have completely won the argument for what Open Data means.
For the Land Use Registry, should the names of who owns what land be fully Open Data? With no restrictions on any use at all? There is a strong case that can easily be made that it may not be the best idea from a privacy perspective. I’m not taking a position on it, but all can agree that there are multiple perspectives which deserve a hearing rather than a position based solely on ideology.
This is one example of the ongoing tensions between access, use, privacy and cost. They pull in their own directions, in different ways depending on different details.
One size will not fit all, and those tensions must be managed.
(If you can’t be bothered reading the rest, that’s basically the entire post in one line)
The mechanism for managing that tension must must cope with the mundanity of day to date processes, the spontaneity of Rewired State events, and the idiosyncrasies of an Iraq War Logs style dataset; and cover everything in between, over time.
It must be easy to understand, clear to handle, and the system must work.
Putting all of these into one organisation is a way to hide the problems, until they explode on to a Minister’s desk. The current data relationship between the Cabinet Office and BIS is opaque at best. It’s somewhat like a border dispute where no one is quite sure what the borders are, and with a large no-mans-land in the middle: full of mud, land mines, and the odd body. No data should slip through those fences into the grey area. Independence and oversight is vital here to avoid those tensions stopping everything or causing a rather big mess.
Avoiding that explosion
The simple way to avoid hidden tension, is to have visible tension.
The governance model for the BBC (ie the bit that does most of the work we think of as the BBC), is the BBC Trust (which does oversight, and can intervene in a wide variety of ways). Already in the worlds of data, there is the Office of National Statistics and the statistics branches of other departments which do the work, all overseen by the Statistics Authority, which has the ability to engage and adjudicate. There are many other examples of similar structures for similar issues. The Election manifestos of both coalition parties had a number of examples where they wanted this sort of structure in the public arena.
Whether it is a Trust, an Authority, a Committee, or something else, is fundamentally irrelevant. The function of Independent Oversight is what matters.
The BBC Trust does policy oversight and similar, it has advisory boards etc, and it can report to Parliament. In summary, it has clear Independence, and Strength.
This is needed for the many decisions coming soon as the distinctions between Open Data and Public Data become more chasmic.
The Corporation can have whatever structure (see next blog post), but there should be some model for licensing, access, legal functions and some shared access operations.
There will be many issues of fights, and turf wars. This sort of model will not be easy, or doable in a single step — reality may change things. In Civil Service staff terms, it’s somewhat analogous to being the Government’s independent Senior Responsible Officer for all data that leaves whatever boundary is decided (say: the civil service).
But when reality does change, and this is a rapidly evolving area, the plans should be capable of changing too. Independence, oversight, and focusing of tensions into constructive and creative routes, rather than into negativity, should be considered.
The oversight authority, should believe in a positive view of the world, not about threats to open data, but benefit of public data. Many in the community would also benefit from this viewpoint. Parliamentary oversight has a habit of keeping the focus as Parliament wishes rather than as incumbents would like.
There is a great deal of friction that comes from the incumbency. Organisations resist change, and, as the data landscape evolves over the next decade, to remain in a competitive position, drive growth, etc, it must have flexibility and a responsiveness to external pressure that has rarely been evident in the last few years by some Trading Funds.
As the PDC data reuse proposals become wider, Dr Foster and other groups will start to be included. While there are limitations on this, the Government has the ability to change the laws.
Simon Bristow has done some interesting work to make data more accessible from Government Surveys. There is clearly more work that he, and others like him, could do. However, there are also privacy and data collection restrictions on that data. The Making Open Data Real consultation points out that ONS do not distribute the source text (the Blaise markup files) to surveys, and this restricts potential reuse of anything. Reusing data has a prerequisite: knowing it exists to be able to look for it.
This expansion of new ideas is not possible under the current PDC proposed framework. The framework needs reconsideration.
The Ordnance Survey have come a long way, not always voluntarily. Other trading funds have done nothing. The current PDC design exacerbates the incumbency tax the current trading funds will require the taxpayer to cover. This can be corrected, unless this government wants to continue to protect these state monopolies.
Research and Scientific Enquiry
Whereas monopolies seek to continue the status quo, then the Research Community looks to move things forward; to change things; to make the world better. In some ways, to remove various bits of the status quo.
Any changes, whatever the structure, must provide free access to data for scientific research purposes. A cure for cancer would destroy the business models of a number of companies who care for cancer sufferers; it would still be a very good thing and a net positive to the UK.
Parts of the Data Use movement have had a great deal of success; one example is the creation of Dr Foster. The benefit to the NHS from the research in that, and the business that flowed from it, has been significant. There is a clear benefit to the overall UK.
However, the Dr Foster process has some “less than ideal” past methods; and those should not be repeated. There is a known problem of back-channels, side-agreements, and a fundemental lack of transparency and accountability in places of Government data releases, under arbitrary restrictions, to arbitrary people. This is not open data, and arguably can never be — detailed research requires detailed data; where protection is not just on the data, but who can get a copy.
Access to scientific or non-open data should not be primarily based on who you know; but something more process orientated. This will require oversight, agreement, and potential cajoling of departments to release more in some cases, and more targeted data in some others where practice needs looking at. There are existing processes in the public and related sectors that could be reused. Guidance of RC-UK bodies and the UK Statistics Agency spring to mind; there are others.
There are a number of issues here, many of which have been effectively solved in other areas, where that model can be reused. With big IT suppliers, Government now considers itself a single entity, which covers lots of contracts with lots of departments. This has lead to potential for large savings, by ensuring that if suppliers want to quibble unreasonably on one project, they know it will be known by all departments with whom they are currently bidding for business. This is a good thing.
The Government should institute a similar policy for all data transfers. If you receive data from multiple departments, it’s under one contract that has arbitrary numbers of schedules for each type of data. If data is misused, lost, attacked, or other undesirable actions, then the sanction is all of it. This is a highly effective sanction against larger organisations and data abuse. Similarly, it should also put requirements around more data than necessary being transferred.
Digital by Default
It is overly simple to say that Data is growing in importance; indeed, it is a predicate upon which both consultations are based.
Digital By Default is usually thought of in the context of the future of services being provided to citizens. I would argue, that it should also be the default for output from Government. That is the core of the data agenda, and that is why some aspects of the data agenda being under the remit of CEO Digital is useful. Digital by Default as an output, has Digital by Default on the input side as a prerequisite. Open has a great deal to offer here as well.
Aside: To the geeks reading this online, someone has to go inside Government and help them with this – as a partnership, not a takeover. If you look at where that partnership has happened, the Transparency Board, AlphaGov, and some of the vast improvements at the Technology Strategy Board over the last less-than-a-year; it’s clear that this works very well.
Electoral Roll as a partial model
When you sign the form for a position on the Electoral Roll, there is a tick box which lets you opt out of the version that’s sold. Digital By Default allows that opt out model to be extended to a great number of other areas. Not only does it make this possible, it makes it easy. And in a data sharing world, which we area already in, those privacy choices should be made available to everyone. In a Digital By Default process, this is easy; it’s impossible without it.
Data is an issue for Whitehall and beyond; and, much of the time, the sharing of it is mostly unexamined, with bits of Government increasingly sharing data. Part of this forms the genesis story of Sir Bonar Neville-Kingdom. That it is a problem across departments is part of what makes Sir Bonar funny. But it is possible for Sir Bonar to retire to Ascot with Euphorbia, as a better model replaces it. Data on data, if you want a Bonar-ism.
The Government is clearly considering whether a new legal framework should be used for parts of this. There are already models for licensing, and enforcement, and the ability to use Criminal Law in cases of abuse. The Statistics and Registration Act may again be a model here. With the tension discussed above clarified in law, with the boundaries between the different groups codified, that can be managed and enforced into the future without significant overhead or problem.
Not only can it be enforced, there can be clear oversight of enforcement. Separating a Trust out from the day to day operations means that there are different pressures on both, and different pressures can be brought to bear on both, but rarely the same way on the same issue at the same time.
Statistical Disclosure Control is hard. It requires significant knowledge of the data, consideration of threats, matches, and a number of other issues. Expertise must be available, listened to, considered, and where there is an impasse, a creative solution can often be found via active consideration of the desires and issues.
Not all data can be open data. If you think of business data, you may think that you can anonymise relatively easily. But if you have certain size classes of firms, at regional level, then you are disclosing details of large telecoms companies with their HQ in the South West of England; coincidentally, Orange’s main HQ is in Bristol. That mistake can not be made once if confidence is to be retained.
Licensing helps here, because, while the data may not necessarily be charged for, data can be restricted in other ways. With enforcement, trust and integrity, and an understanding of what the data will be used for, and that it wont be shared — all simple components of a license – more specific data can be released for various purposes. Single Customer thinking also helps here. While some readers may think Experian slightly evil; their business model collapses if the data flow on which they depend gets turned off. They will behave, better than you expect; and such processes offer increased transparency.
A private sector example; there is a centrally maintained database that contains the details of 99% of all UK mortgages in it. That database is almost vital for financial companies, and it is highly sensitive information. As a result, the protections on the data are high, because of the implications if it can no longer exist. Short term benefit, in that case, does not outweigh long term goals.
Review of Decisions
In 1961, Newton Minow, then the new Chairman of the US Federal Communications Commission gave a still memorable speech, famous as criticising the then broadcasting networks for broadcasting a “vast wasteland” of content. The line he wanted to become famous was his call for television in the “Public Interest”. If you’ve not hear (or read) it, you may wish to do so, plus this 50 years on event.
There is a significant risk of the current Open and/or Public data issues becoming similar. Locking up what is useful, releasing what doesn’t threaten, and progress being stymied.
His argument, his call, was for there to be more options; and this has undoubtedly brought an improvement in the highest quality (if not, a raising of the lowest common denominator).
While parts of his talk do not apply, half a century on, across the water, other parts clearly do. The call for Public Interest rather than what’s in the public’s interest. Additionally, his discussion about licenses.
Those who get to use the Public’s Data should not expect automatic renewal of a monopoly on a proforma basis. If they make promises about limiting others from access – in Minow’s terms, the license for one of few TV channels, or in our terms, privileged access to a specific dataset – then those promises should be examined after a fixed period of time. If the interest test generated the most benefit for the public – in current terms, growth or civic value, then they may be considered for renewal. If it was a contested decision between multiple positions, and one was chosen via some transparent process, and on reflection, the promises made were not reflected in reality, then after a period of 2-3 years, a change can be made.
Finance and profit are difficult to predict in innovative areas with incomplete information. By their nature, the areas suggested by the PDC is full of those.
A comparison may be with the 1998 3G auctions, not in terms of volume, but in terms of a complete divergence of expectations and reality. I have little confidence in the ability for others to do better.
No one would have expected that the Canadian Tax agency releasing their list of top 10 charities for GiftAid (or their equivalent), would generate CDN $3.2billion via the identification of a number of huge tax frauds. On a not unrelated note, has the UK Treasury released their equivalent list?
Creative solutions will be needed. It may be, if a company forecasts huge profits, they simply be taxed an additional 10% over normal on their profits. It may be, that they pay an up front fee for the work required to produce the data, or something in between. My point, is not to recommend any suggestions, or that any of these are sensible, but that that there are a large number of possibilities, some of which will be more sensible and solve various issues and barriers, and some may be more daft than these. There will be issues with all approaches, and one size does not fit all. Perverse incentives should also be avoided.
The tension of oversight here would also prevent repeated and too egregious raiding of the Public Interest, subject to whatever Parliament wishes the priorities to be.
Context of consultation
In the context of the consultation, there are a number of issues. Whatever is created will be required to make sense of an evolving world, and to also evolve in that world. The current proposals do not make sense in this current world, let alone the world of the next Parliament.
The tension can not be resolved. It is perpetual. The PDC or any plans must account for this, or we will continue to have this discussion until it is accounted for, and structures that evolve into the future. And it can either be harnessed to drive progress in a number of areas around data, or drag them backwards.
Much more detail needs fleshing out. This is part 1 of the draft set of thoughts, some bits of which will be lifted straight into consultations response (and adapted subject to feedback). I welcome comments, and appreciate that this is long (and “somewhat” rambling).