Initial Insights Report
Key findings
- Data-driven approaches to the Covid-19 pandemic exist on a spectrum between entirely automated, AI-powered processing and the more ‘mundane’ uses of digital information and statistics with a ‘human in the loop’;
- A number of wide-ranging benefits have been observed, including: supporting the continuation of offering services remotely during the pandemic, and providing the ability to ‘find simple pictures out of mind-bending data’ therefore gaining insights which may not otherwise have been possible;
- At the same time, there are significant risks which lie not only in the use of novel, automated – and in many cases, untested – technologies, but in the underlying data itself, in terms of its quality, robustness and embedded assumptions. Greater transparency is needed regarding these shortcomings in order to ensure the maintenance of public trust.
Our work so far
We have made rapid progress with our initial landscape mapping assessment so far, having spoken to 33 stakeholders from a wide range of backgrounds and disciplines. These range across data organisations, government, regulators, law enforcement, the medical profession, the legal profession, charities and the third sector, the private sector and an inter-disciplinary range of academics. In this blog, we share some of the key initial insights gained from these interviews, which will be used to inform the next stages of the omddac project.
The meaning of ‘data-driven’: a spectrum
To begin with, one of our key initial findings was that – perhaps unsurprisingly – the meaning of the term ‘data-driven’ is not set in stone. In fact, we found a wide range of views as regards what constitutes a ‘data-driven’ approach. As one stakeholder noted:
We’re in a sort of Russian doll moment, where everything is data-driven. Everything from the tiers that people are put into, through to the vaccine and who gets [it]. There are layers and layers and layers and I would define a data-driven response as on a spectrum.
omddac Stakeholder
This spectrum, it appears, ranges from the more ‘mundane’ uses of digital information and statistics for activities where a human is very much ‘in the loop’, all the way to processes which are entirely automated and rely heavily on more sophisticated techniques, such as AI and machine learning.
What was evident from our discussions was that the concerns of our stakeholders were not confined only to the implications of using more novel, automated technologies such as automated contact tracing (for instance the NHS COVID-19 app for England and Wales), digital immunity passports and predictive pandemic modelling (although these concerns were, of course, frequently raised).
The issues highlighted by our experts, in fact, spanned the whole of this ‘data-driven’ spectrum covering, for example, the use of data by the government in the public coronavirus briefings from 10 Downing Street. Indeed, in many cases, our participants’ primary concerns related to the fundamental challenge of interpreting large volumes of data and effectively communicating key statistics to the general public, rather than issues relating to the introduction of new technologies.
Potential benefits
Our stakeholders were able to point to a number of benefits arising from the observed data-driven approaches, which were similarly wide-ranging, extending beyond the potential advantages as regards public health and responding to the pandemic.
A number of participants found that data-driven approaches had supported the continuation of services remotely during the pandemic (such as remote court hearings; medical assessments; police interaction with the public, and – of course – home working). Various programmes involving the flow of data from a central location to local public sector entities – for instance with the shielding programme – were also considered to have been particularly successful, by combining the centralised data with local knowledge to reach affected individuals effectively. It was also observed that the pandemic had stimulated promising innovations in privacy-enhancing technology, such as the OpenSAFELY platform.
More generally, stakeholders felt that ‘data science can find simple pictures out of mind-bending data […] enabling people to find meaning in complexity’. This capability has enabled insights which perhaps would not have been possible from analysing the data solely at a more granular level, and has been employed by the police, for example, to monitor the force-wide consistency of enforcement of the Covid-19 regulations and identify disproportionate administration of sanctions. One interviewee did point out that, in general, this opportunity was perhaps not being fully taken advantage of and should be utilised more effectively in the communication of relevant public health statistics, to guarantee accessibility for a public audience; something which this stakeholder felt was the government’s duty to ensure.
Key risks
Our stakeholders were, however, keen to draw attention to a number of significant risks which must also be addressed in the use of ‘data-driven’ approaches. As discussed above, we found a broad understanding of ‘data-driven’ responses to the pandemic and this also reflected upon the nature of the main risks which were referred to by our stakeholders.
The majority of participants (75%) held serious concerns around the quality, robustness and shortcomings of the underlying data itself, and the associated lack of transparency around these issues. Our stakeholders were concerned that the underlying datasets, based upon which key decisions are being reached, often contain errors, out-dated information, and omissions which can have serious implications for the results of any analysis. Indeed, the omissions themselves often ‘tell a story’: one participant was particularly concerned about ‘hidden crimes’ with regard to, for instance, domestic abuse and child abuse given the lack of reporting during the pandemic (in line with the closure of schools and the lack of access by social services).
In addition, by its nature, the quantitative, administrative datasets can be lacking in context and qualitative insight, which many of our stakeholders consider to be a necessary element that should be incorporated into any ‘data-driven’ processes:
The data can be distracting – national mental health minimal data sets are known to be flawed. The data is often incomplete; it’s difficult to categorise and learn lessons from such data. Also, the data ignores the fact that some things we are observing during the pandemic cannot be mathematically reduced into facts and requires some form of interpretation, coupled with local knowledge.
omddac Stakeholder
As well as a lack of transparency around the quality of the data itself, 38% of stakeholders were concerned about the ways in which data and associated analyses are being presented as factual and objective, without being clear about any underlying uncertainties, bias, values or assumptions involved:
What I’m really concerned about is the idea that data is ‘true’ and that it’s always correct, and that a data-driven decision is somehow neutral. I can imagine that there will be a whole load of things that become automated or that then happen in a less democratic or transparent way because they are perceived to be based on data.
omddac Stakeholder
As one of our participants pointed out, ‘all models have assumptions baked into them’. What is important is that there is transparency around the decision-making process – disclosing, for example, any limitations within datasets and the assumptions which are used to fill gaps.
These issues impact, directly and indirectly, upon a further risk which was highlighted by almost half (48%) of our contributors, concerning ethical and social justice issues, including discrimination and matters of basic unfairness. Many participants, for instance, highlighted the disproportionate impact of Covid-19 on minority communities, whose data is often missing from relevant datasets.
Relatedly, one stakeholder specifically highlighted the ethical issues involved in uniformly applying the findings from analysis of aggregated data to the medical treatment of individuals, once again showing the significance of qualitative context and expertise, as well as the importance of transparency with regard to any assumptions made:
I am concerned about the use of group statistics being applied uniformly to individuals. For example, group statistics which might show older people over 75 with 3 co-morbidities should not be treated – should this be applied to all over 75 year olds with 3 co-morbidities, even though individually they may respond well to treatment? The concerns we should have about ageist criteria being applied to data arising in care homes […] is very big worry for ethicists.
omddac Stakeholder
As we have found, the concerns raised by our stakeholders extend beyond the use of new technologies and associated implications for privacy and data protection (though this concern, of course, remains central and was raised in 46% of our interviews). Other risks which were also cited include: infringement on basic rights and liberties; potential for scope creep; operational issues with the responses themselves (most notably the contact tracing app); the concentration of power in correlation with the centralisation and control of data; and, similarly, the involvement of the private sector and its possession of sensitive medical data.
Moving forward
In general, our stakeholders felt that there is significant value particularly in the sharing and analysis of aggregated datasets, which the various data-driven approaches encourage. In order to benefit from the opportunities presented by data-driven approaches, however, and in response to the risks, our participants suggested that steps must be taken to address data quality issues, for instance by ensuring the inclusion of minority communities, as well as the incorporation of context through qualitative data.
The importance of robust safeguards and governance frameworks was also highlighted as key – though it was stressed that these should go beyond narrow data protection impact assessments to incorporate broader ethical principles and social equality assessments to combat the risks discussed above: ‘this is just as important as the technical strategy […] doing things ethically is just as important an enabler’.
Centrally, our stakeholders emphasised the need for greater transparency with regard to how decisions are being made and any inherent limitations in this process, in order to maintain public trust:
People need to be made aware of [the] limitations, and whether data/results are being over interpreted, whether data is missing or that gaps exist. [There is a] need to better communicate the limits of the data to the public. […] People are able to understand complexity and uncertainty if it is framed in the right way. People will trust more if we accept and communicate the uncertainty to them. There are stark differences around the world in how leads communicate around COVID. Those that are clear and honest and acknowledge the uncertainty have done a far better job of gaining public trust. People are capable and interested in understanding that complexity.
omddac Stakeholder
Next steps for omddac
The data we have gathered from our initial landscape assessment has been extremely fruitful and we would like to take this opportunity to thank our stakeholders for their valuable input.
Issues concerning public perceptions, trust and acceptance of new data-driven responses emerged as a major concern among interviewees, warranting further scrutiny. We will explore these issues in detail through a public perceptions study which will be conducted as part of the next stages of this project.
More immediately, the omddac team will begin the process of short-listing potential case studies for further in-depth analysis – keep an eye out for our snapshot reports where we will provide our key learnings from each case study review.