Should we worry about data held in private, and if so when? I think we should, sometimes, while recognising that it’s not always clear-cut. We should also recognise that sometimes it is absolutely clear-cut – and we know for example that people die because of the ways in which some short-term, private interests are allowed to dominate longer-term, human ones.
A quick summary of a long post:
In some cases, there is a disproportionate social cost to data being held in private. Two particularly egregious examples are
(1) the corporate tax dodging that opacity facilitates and encourages, and
(2) the damage to human health that stems from a failure to regulate for transparency in the pharmaceutical industry.
When
(3) private firms are specifically engaged in gathering data on human development, we may need to create an ethical charter along the lines of the Declaration of Helsinki in medical research.
Most of the posts so far on this blog have focused on things which are completely uncounted . By ‘completely uncounted’, I mean that the data in question simply do not exist: whether that is through, for example, innocent omission, a lack of funds or the misuse of political power.
Quite a different category is that of data which are privately held and not publicly shared, which I’ll label ’privately counted’ until somebody makes me a better offer.
I’m not going to argue that all private data should be public. We can think of a spectrum of private data, running between two poles:
- Data for which remaining private has high private benefits and low public costs (let’s call this the private end of the spectrum, which should remain so), and
- Data for which the private benefits are uncertain and the public costs clear (the privately counted end, which should not remain so).
We could reasonably disagree about the exact point on the spectrum at which the argument for publication becomes overwhelming, and that might be a really productive discussion; and a central democracy issue, not just something that can be sat in the ‘development-only’ box (increasingly I wonder if that’s an empty box in fact).
Much personal data will be at the private end of the spectrum. I tend to agree with David Eaves that it’s important to distinguish Open Data;
Obviously “open” and “personal identifiable” data can overlap, but they are not the same. A great deal of open data has nothing to do with individuals. However, if we allow the two to become synonymous… well… expect a backlash against open data. No one ever gave anyone a blank check to make any and everything open. I don’t expect my personal healthcare or student record to be downloadable by anyone – I suspect you don’t either.
This post is focused on corporate data, however, and in areas where the overlap with personal data is limited, so the arguments become a little less clear.
What I want to do here is to highlight three important examples; well, to be honest, two quite different but (I think) completely unacceptable and systemic cases from the privately counted end of the spectrum, and one less clear question. The unacceptable examples relate to corporate tax and to knowledge-denying behaviours that seem rife in the pharmaceutical industry. Finally, the question relates to the role of the private sector in generating development data.
(1) Privately counted corporate data
The non-publication of data on corporate performance, including tax paid, is not a new issue to this blog. This non-publication of data on a country-by-country basis can prevent companies being held to account, either by regulators or citizens, for their behaviour and adherence to regulation in multiple areas A report from the Task Force on Financial Integrity and Economic Development (pp.14-18) lists the main areas, including anti-corruption efforts and the removal of major distortions to international trade, and the main stakeholders who would benefit, including shareholders, customers and tax authorities.
National regulation may ultimately prove to be inappropriate for today’s multinational enterprises; but at present it has no chance, without knowing which companies are part of the same group, and how the group’s economic performance is distributed between jurisdictions. The public benefits are likely to be large and widely shared. Arguments on the private costs revolve around claims that the implied accounting costs are too high, and claims that the disclosures would reveal commercially sensitive information to competitors. I’m not aware of any substantive evidence for either of these; and they do not seem to have prevented a handful of companies in the extractive sector from publishing some country-by-country data at least. Last year saw the Danish authorities begin to publish data on individual companies’ tax, which doesn’t seem to have caused great trouble yet, and rather begs the question of why it isn’t done elsewhere.
On balance, it seems surprising that this data is not a requirement from enterprises which seek to benefit from the opportunity to structure themselves across borders; and likely that this will change soon – driven by the growing interest and concern from investors and from business rivals in major OECD countries, as well as the public and political condemnation around tax issues in particular.
(2) Privately counted medical trial data
An even clearer case at the uncounted end of the spectrum is that of the pharmaceutical industry and the data generated by trials. [Full disclosure: Save the Children partners with some pharmaceutical companies.] It is perhaps inevitable that some companies might, for profit motives, seek to distort the flow of information from their trials into the public domain. It is surely unacceptable, however, that regulators of all sorts might condone this (and sometimes participate themselves) rather than meet themselves, and demand of others, a higher standard.
Ben Goldacre’s Bad Pharma makes a (slightly disorganised but) enormously powerful case that the damage done is unacceptable. Perhaps the easiest way to see the central point is to use his example of the Cochrane Collaboration’s logo – I take the following text, with permission, from their website. The logo tells the story of how a failure to combine existing knowledge led to a large scale of completely unnecessary suffering.
The Cochrane Collaboration logo illustrates both our global objectives and our key scientific processes. The circle formed by the ‘C’ of Cochrane and the mirror image ‘C’ of Collaboration reflects the international collaboration that makes our work relevant globally. The inner part of the logo illustrates a systematic review of data from seven randomized controlled trials (RCTs), comparing one health care treatment with a placebo. Each horizontal line represents the results of one trial (the shorter the line, the more certain the result); and the diamond represents their combined results. The vertical line indicates the position around which the horizontal lines would cluster if the two treatments compared in the trials had similar effects; if a horizontal line touches the vertical line, it means that that particular trial found no clear difference between the treatments. The position of the diamond to the left of the vertical line indicates that the treatment studied is beneficial. Horizontal lines or a diamond to the right of the line would show that the treatment did more harm than good.
This diagram shows the results of a systematic review of RCTs of a short, inexpensive course of a corticosteroid given to women about to give birth too early. The first of these RCTs was reported in 1972. The diagram summarises the evidence that would have been revealed had the available RCTs been reviewed systematically. A decade later it indicates strongly that corticosteroids reduce the risk of babies dying from the complications of immaturity. By 1991, seven more trials had been reported, and the picture had become still stronger. This treatment reduces the odds of the babies of these women dying from the complications of immaturity by 30 to 50 per cent.
Because no systematic review of these trials had been published until 1989, most obstetricians had not realised that the treatment was so effective. As a result, tens of thousands of premature babies have probably suffered and died unnecessarily (and needed more expensive treatment than was necessary). This is just one of many examples of the human costs resulting from failure to perform systematic, up-to-date reviews of RCTs of health care.
The Cochrane Collaboration are focused on the process of assessing existing data; Ben Goldacre’s book highlights a different part of the challenge, detailing the whole range of ways in which the pharmaceutical industry distorts, with predictable and damaging effects, the data that is created; the data that is released; the way the released data is presented; and the way the findings are communicated to medical specialists and to the public.
We have allowed a system to develop in which a great deal of human knowledge is only privately counted.
The result: that in far too many cases, and with far too great a human cost, which a blobbogram like the Cochrane Collaboration logo would confirm, we are failing to use information which would save, and improve, lives. It’s not unthinkable that the potential, direct beneficiaries number in the billions rather than the hundreds of millions of people.
At the grand level, humanity as a whole is losing out. Can you imagine what the hypothetical alien visitor would think of the system for health-vital knowledge that we, as a race, have constructed?
The repeated failure of companies and regulators to live up to promises made in relation to transparency surely points to the need for political action to mandate – and substantially to backdate – public access to this information.
(3) The private sector as (development) data-collector
Finally, a much more open case. FAO and Gallup are working together (see point 3 of this note) to see if the latter’s World Poll can deliver annual updates on self-perceptions around food insecurity, to complement less frequent FAO measures of hunger. [Full disclosure: I have benefited from discussions of this, and related issues, with Gallup research staff.]
Aside from standard concerns with Gallup’s data (see the seven points listed in section 3 of this IFPRI study, for example), I worry about the potential conflicts in tying this new public-access series to an existing private dataset (the full World Poll).
Only those who have paid for the full dataset will be in a position to access all the relevant knowledge – so while I, or a Malawian citizen will be able to see the Malawian responses to food insecurity self-reporting questions, and compare these over time or to other countries, we wouldn’t be able without payment to assess the full distribution – we wouldn’t know, for example, if there happened to be important correlations with other questions that would explain certain patterns, or point to the need for particular interventions to assist marginalised groups (assuming that the marginalisation is only clear from the full dataset). And so on – it seems there will be a private withholding of what is potentially important additional data for what will be a public-access series, quite possible used in the post-2015 development framework.
Now of course Gallup has invested heavily in creating the World Poll, and so there is a clear commercial logic and moral rationale to their withholding the data for payment, as well, perhaps, as an economic argument about not disincentivising a similar private sector effort in the future.
And yet – does this outweigh the public value of the data, once the decision has been taken to work with FAO to produce the global series of annual data in relation to food insecurity?
Assuming nothing changes, I will remain concerned about this; but I would be concerned also if Gallup were somehow forced unfairly to give up their asset. One answer may be for a donor to buy the rights to the World Poll for public access, and that may be useful; but it won’t solve the underlying problem.
It feels like we may need to think more about the rights of people who provide data, in relation to the companies who aggregate it. Where the data in question go directly to the human rights and well-being of people, their ability to share in the benefits seems that much more important.
In medical trials, the Declaration of Helsinki is explicit on this point:
Medical research involving a disadvantaged or vulnerable population or community is only justified if the research is responsive to the health needs and priorities of this population or community and if there is a reasonable likelihood that this population or community stands to benefit from the results of the research.
Is it time for a similarly explicit ethical charter in relation to the collection of human development data? At the very least, greater clarity on the expectations of public access would be valuable.
While individual cases may not always be clear, I do think there is enough evidence of a widespread issue of important human knowledge being ‘uncounted in private’. Can we redress the balance without treating companies unfairly?


