Chapter 1.3: Ethical considerations
This chapter concludes the module on data governance with a discussion on ethical handling of data.
Data ethics is becoming increasingly important in both public policy and the IT community. However, many still tend to interpret data ethics in a narrow sense, thinking about it primarily as a set of hard-and-fast rules to be followed. But data ethics is more than anonymising data and protecting personal privacy. An ethical approach to the use of data will also pay attention to other aspects of the data and how it is used. This chapter will first discuss the various technical approaches to ethical compliance that stems from regulatory requirements before considering data ethics through the lens of accuracy and representativeness.
Data ethics: privacy and security compliance
When viewed from this perspective, data ethics is about access, use and collection of data that conforms to the legal framework of a given jurisdiction. The aim is to protect individuals from unauthorised access to their private data and to prevent any inappropriate uses thereof. Individuals are treated not as passive subjects but rather as rights owners who can access, inspect, update or correct their data at any time.
Regulations such as the GDPR go beyond the overall protection of personal data and require that its use has a well-defined goal (finality) and that organisations collect only that data which is absolutely necessary to achieve that goal (proportionality). For example, information collected as part of the subscription process cannot be used for another purpose (e.g. direct marketing) unless the person gave an explicit consent.
Ensuring privacy and security compliance involves processing personal data in such a way that no sensitive information can be derived from the collected information. Some techniques that support these efforts include
Data selection (omitting): Leaving out sensitive data that can be used to track or identify individuals.
Anonymisation: Achieved through encrypting or by removing personally identifiable information from datasets.
Pseudonymization: Replacing personally identifiable information with artificial identifiers or pseudonyms.
- Blurring: Used mainly in the field of image recognition, the technique is used to pixelate or otherwise redact or darken out the images e.g. CCTV footage.
Feel free to read this PoliVisu White Paper on Privacy Rules and Data Anonymisation to deepen your knowledge of the foregoing techniques.
Data ethics: accuracy, provenance and unintended consequences
As mentioned above, data ethics is more than anonymising data and protecting personal privacy. An ethical approach to the use of data will also pay attention to other aspects of the data and how it is used. For instance
Accuracy: Has the data been collected accurately and has it been correctly labelled. For instance, what if someone with asthma buys a house having checked official data to verify that it is in a low pollution area, but then it turned out that the sensor was wrongly located and the area is, in fact, heavily polluted.
Provenance: It’s not unknown for malevolent actors to insert false information that others, in good faith, use and broadcast. This is widespread in social media, but it can also happen in seemingly reliable data collections. For instance, someone planted faked documents in the official UK National Archive falsely suggesting that British intelligence agents murdered the Nazi Heinrich Himmler in 1945, and this was used in good faith as the basis of a book.
Facilitation of crime: Unintended uses of the data need to be considered. For instance, in releasing data about burglaries, governments have to take care to blur the data not for the privacy of the victim per se but because house-specific data would enable burglars to know from where new computers, flat screen TVs etc. could be stolen in a few months time after insurance payout and purchase of replacement items.
The visualisation of data, even if it is accurate and reliable, can also raise ethical issues. For example
Using correlation to suggest causation: There are many surprising random correlations of different data, and a visualization can be a powerful suggestion of correlation. For instance, did higher use of Internet Explorer really lead to higher murder rates?
Figure 3. Internet Explorer v murder rate
Failure to visualise the key message: Consider an example from the space industry. The Thiokol engineer graph that was used to make a decision to launch the space shuttle Challenger in extremely cold conditions contained correct data on the relationship between temperature and the type of failure that led to the disaster, but the chart did not make the danger apparent as it was unreadable.
Figure 4. The Thiokol chart
Some experts have suggested that an alternative presentation of the data could have saved 7 lives.
Figure 5. An alternative version of the Thiokol chart that may have stopped Challenger's launch
Visualisation of only part of the data: While not technically wrong, improper extraction, deliberate omission or inclusion of only a certain chunk of data is certainly misleading. For instance, this example shows how only a limited time range can give a different view of the data.
Figure 6. Five days v three weeks
Visualisation choice can affect decisions: Can different ways of showing the same data lead to different decisions? And can those decisions be about something important, like continuing a clinical trial? A study published in 1999 shows that they can, and the way the data is represented does make a difference. For instance it was found that four different visualisations of the same results of fictitious medical clinical trials led to different recommendations by doctors about whether to continue the trial.
This chapter tried to show that effective data governance requires a broader consideration of how data is being collected, processed, visualised and interpreted. In this regard, data ethics is an important concept which goes beyond privacy and security. Besides protection of personal information, data ethics aims to provide accurate, complete and representative information to the audience.