Ever heard the term ‘data science’? If you’re an avid reader of this blog or data.blog.gov.uk you probably have. But nearly 40% of the population have never heard about it, and over 20% have heard the term but don’t know what it means.
But we’ve all probably experienced it in one way or another, even if unknowingly. Our credit scores are calculated by algorithms, for example, as are suggestions for films we might like to watch on Netflix or books we might like to buy on Amazon. And through tagging our friends on Facebook, we have been training the world’s cleverest machine-learning facial recognition programme.
Data science - applying powerful new computer techniques to newer as well as traditional forms of data - is now also being used by government as part of its work to improve how society works. We get our weather predictions from computers learning from historical meteorological data (and with windfall data, people can even predict the impact of leaves on the railway track). We can segment people who are on health-related benefits so they can get a better service, or identify people who might be due winter fuel payments. And ONS is exploring how geo-located Twitter could provide new insights into population mobility.
But the challenges that these powerful computer techniques throw up are new, and the law (primarily the Data Protection Act) is complex. It was, after all, written before data science was really around, and is located in a number of different places. So over the last 18 months, GDS has been working with ONS, GO-Science and Sciencewise to develop an ethical framework for the use of data science in government so that people feel confident to innovate with and use data.
We took an open, user-centred and evidence-based approach to developing the framework. We started with a series of expert roundtables including academics, think tanks, civil society and industry. We did some user research with data scientists and policymakers in government to understand how they were doing projects and at which points they made (or needed to make) ethical decisions. And we conducted a public dialogue - run by Ipsos MORI and Codelegs - which engaged the public in a debate around data science through deliberative workshops (video here), a survey and an online quiz. Today the Minister for the Cabinet Office, Matt Hancock, launched the first iteration of the framework at a workshop with stakeholders to explore the evidence and update the framework.
The public dialogue opened my eyes in three key ways, which show that a continued open process is crucial:
Firstly, people are more likely to agree that government should explore innovate data uses if they understand the general benefits of what data science can do. Case studies proved a crucial way to do this, so we need to communicate the great work that we - and others using data science to improve society - are doing. But this only gets us so far.
Secondly, people assess projects on a case-by-case basis in two stages. The first is related to their opinion about the problem data science is trying to solve. Therefore we need to engage with people through the wider lens of the policy area (e.g. how can we support people to make the right career choices?), not just on the data science method.
Finally, once they have bought into the concept of data science in general, and agree with the policy area, the public then move on to a very nuanced assessment of benefit versus risk. This includes privacy, but also the risk that the project is not effective or that there are unintended consequences. These are all weighed up against the public benefit, meaning that people can understand the need for using data that is sensitive, mandatory and with potential false positives - as long as there is accountability built in. This means we need to increase people’s understanding of how data science works as well as telling them how we are mitigating any risks.
So there are four pointers for continued engagement:
- Communicating case studies to show the benefits of data science;
- Communicating what we are doing to mitigate risks (through the ethical framework );
- Increasing data literacy;
- Continuing to engage people in the problems themselves (not just the method).
As a starter, Ipsos-MORI and Codelegs have created ‘Data Dilemmas’, which is a fun interactive quiz where you can learn about some of the ethical challenges we face with data science and take the test to see what sort of data persona you have. Take it, tweet it and tell you friends!