I’m off to speak at Open Data Manchester this evening about both how open data and open working can make better policy. I thought I’d share some of the tools we in Policy Lab use to help us think through how data can help us understand more about the problems we are trying to solve.
Data can do amazing things. Basic visualisation has been used for ages to uncover patterns and to create better policy responses. For example, in 1854 John Snow’s mapping of cases of cholera around Soho showed that they were all clustered around the Broad Street pub, and therefore revealed the source of the deadly disease. In the same decade, Florence Nightingale began using polar charts to plot different causes of deaths in the Crimean War, revealing that the majority were due to preventable cases and led to a change in nursing practices.
Over 160 years later, government is continuing this tradition with increasingly powerful computers and new forms of data. The GDS data team is part of the Government Data Science Partnership which aims to promote data science within government. Data science combines maths and computers to find answers in huge, complex and real-time data sets. And much of this data is openly available. For example, COBR (the government’s emergency committee) has developed a flood alert system, the Food Standards Agency has used Twitter data to detect Norovirus outbreaks, and the ONS is exploring how they can scrape prices from supermarket websites to feed through to economic outputs and how geo-located Twitter could provide new insights into population mobility.
In Policy Lab we've been using data science as part of our projects, which we conduct in as open a way as possible: bringing in stakeholders and users to help us understand the problem and develop solutions. In our Health and Work project (which looked at how we could support people to manage their health conditions and stay or thrive in work), we worked with Mastodon C and used the openly available Understanding Society survey.
Normally, we would have picked some variables we thought might correlate and run regression analysis to see how they related to each other. But using data science allowed a computer to correlate thousands of variables very quickly (a human would have been there for days). This confirmed some our expectations of risk factors, but also came up with some surprises (for example, bereavement in men showed as a particular risk factor). We were also able to use a segmentation technique to group similar people (all de-identified I should add) who had said they were on health-related benefits. This allowed us to see a) how two segments of people saying they were on health benefits had moderately good health and therefore required a different non-health intervention and b) how these two clusters were very different (one group was older men with previously well paid jobs, the other was younger men with previously low paid work) and would themselves require different types of interventions.
Since then we've been working on tools to help policymakers imagine what sort of data - not just traditional data - could be used to uncover more insights about their policies. We normally use user journeys to get people to put themselves in users’ shoes and get them to think about how they interact with services. Now we give them a set of data cards with different types of data on and think about how people might generate these along their journey. We looked at sensor data, social media data, Google search data, surveys, administrative data, geo-location data: in other words, types of data that policymakers might traditionally use, and others that they might not.
For more advanced data scientists and policymakers there are other sets of data cards giving them inspiration for how data has been used elsewhere and prompting them to think about the methods they could use. And finally, a set of cards promoting them to think about how to use data appropriately, so we work in a way that is open and acceptable.
If you wanted to find out about or try more of the above, there are two things you could do: come to our Data Science session on 12 May from 1pm - 2pm at the Cabinet Office (contact email@example.com), and go to the OPM toolkit and download these tools. Try them yourself and tell us how you get on - they are a work in progress, so we’d appreciate your feedback!