OLCreate: PUB_6214_1.0: Future Privacy Challenges

Future Privacy Challenges

View

This module poses some questions for you relating to emerging or future privacy challenges. How can we build a privacy-preserving metaverse? How do we navigate a world full of generative AI agents, where we can no longer prove anything is real? And

Virtual and Augmented Reality

"This poses a very serious privacy risk, as it potentially eliminates anonymity in the metaverse ... 'Moving around in a virtual world while streaming basic motion data would be like browsing the internet while sharing your fingerprints with every website you visit. However, unlike web-browsing, which does not require anyone to share their fingerprints, the streaming of motion data is a fundamental part of how the metaverse currently works." - New research suggests that privacy in the metaverse may be impossible, Dr. Louis Rosenberg, Venturebeat (2023)

Virtual reality (VR) and augmented reality (AR) are hugely promising technologies. Families split across continents can hang out together 'in-person' in the metaverse. Work could become truly location-independent, unlocking 'in-person' collaboration at the 'office' in tech hubs like Silicon Valley for software professionals from around the world. And of course, for gamers, VR is hugely exciting. Personally, as a Harry Potter nerd, your author can't wait to attend a virtual Hogwarts. But in order to work, these technologies need to capture a new type of personal data: motion data. And with it comes new privacy risks.

A recent research study found that users could be uniquely identified at scale across multiple sessions using just a short sample of their head and hand motion. With a 10 second recording, accuracy was around 73%, and with 100 seconds of motion, accuracy was over 94%. This is a significant and troubling finding. It means that motion (biomechanics) data - which is of course necessary for VR to work, from initial device calibration to the user's ongoing interactions with the VR world - would likely be considered biometric data. This special category of sensitive personal data requires a stronger legal basis for processing under some data protection laws, such as the EU's GDPR. And this categorization could mean that expecting your employees to come to work to your virtual office in the metaverse would be illegal under employee data protection law.

❓ Are you considering building VR or AR into the products you build? How could you build in stronger protection for the motion data you collect, and how could you make it less identifiable? The article quoted above proposes some mitigations, such as adding noise to the data, but these need to be applied with care to ensure motion tracking still works without glitches.

Generative AI

As I write this, ChatGPT is still a relatively new phenomenon, and yet it has already spawned hundreds of companies and raised enough privacy and ethics concerns to devote an entire course to it alone. This content only begins to scratch the surface of the topic, and with the rapid pace of change we are seeing, it may be out of date by the time you read this. I strongly encourage you to do your own reading around the topic, as it is will affect every technology professional. If you're unfamiliar with the technical details of generative AI, check out Google's free Generative AI learning path, which covers core concepts such as diffusion models, encoder-decoder architectures, attention, and transformer and BERT models. While this content focuses on harms, it's important to remember that generative AI could be used to transform society for good - if we use it with care, which sadly isn't happening right now.

Consent? Legal basis for processing? The developers of the Internet scraping tools used to produce the vast datasets these models are trained on couldn't care less. The developers of one popular tool for example, img2dataset, were adversarial - and demonstrated a total lack of ethical or legal awareness - when confronted with questions about consent and opt-out mechanisms. Hopefully we will see some enforcement actions here, but most authorities have been slow to act.
The Italian Data Protection Authority, however, has not been slow to act - at least when it comes to ChatGPT. AI companies aren't exempt from the GDPR, although OpenAI apparently thought they were. After an enforcement action, ChatGPT was unavailable in Italy for some time while OpenAI hastily implemented (very limited) support for data subject rights. There are still many open questions regarding data subject rights, such as how the right to be forgotten can be exercised for a foundation model. Removing someone from the model might require editing the (terabytes) of training data and then retraining the model at a cost of hundreds of millions of dollars - and that's if the original training data is even available as one dataset, versus continuous training on a stream of training input.
Prompts sent to Microsoft's Azure OpenAI service - and the outputs returned - are not private. They are analyzed by employees for abuse detection and debugging purposes. In general, never assume that your interaction with an LLM is private; consider it as public as a Google search - with Google of course watching as usual, plus a journalist looking over your shoulder! This has already caught out Samsung, which banned ChatGPT because employees were leaking confidential source code to it.
Not only might the data you input be retained and examined, but it will be used to train the model, unless you explicitly opt out (where this is even possible). Together with the costs of retraining a foundation model from scratch, this raises not only data ethics concerns but also security concerns. What if someone deliberately poisons the model with false data about you? Within the EU, you should have the right to rectification, but no-one yet has a plausible proposal for how this would work in practice with an LLM.
Badness.AI is a catalog of real-world harms caused by generative AI, including bias, misrepresentation, abuse of deepfakes, and user manipulation. Generative AI raises many trust and safety concerns, which is a distinct but complementary field to privacy engineering. The Trust & Safety Teaching Consortium has excellent course content available if you'd like to dive in to the topic.
'Finally' (this is far from the last issue, but we have limited space!), the scraped training datasets also have the exact same data limitations as the Internet itself, such as underrepresentation of the Global South, which makes it more likely that content produced for users in the Global South will be unreliable (i.e. 'hallucinated' due to lack of data).

What does privacy mean now in a world where anyone can easily produce realistic fake photos, videos, writing, or audio supposedly by or of "you"? We’re a long way from the world of gentlemen opening each other’s mail.

Space Privacy

We've discussed how privacy expectations can vary depending on your cultural background and the specific situation (context) that you're in. Now let's try to apply this to a brand new context - what novel privacy expectations might future Mars colonists have? The privacy of astronauts on the International Space Station can provide us with some insights: they share confined spaces for months at a time and are usually on camera, but do have a closet-sized private space and are allowed to turn the cameras off at the end of the workday. Their health is constantly tracked using a range of different sensors.

❓ Imagine you are preparing for a six-month spaceflight on a small spacecraft. What are your top privacy concerns? How could the spacecraft be designed to accomodate them?

With cloud companies already moving into space with satellite ground station offerings such as Azure Orbital and AWS Ground Station, we are already confronting questions of space privacy and applicability of law in space today. Our existing data protection laws apply in some cases, but for other scenarios - particularly long-distance space flight - they will need to be extended. In a fascinating article, Michalsons explore how processing personal data in outer space could be a loophole around the GDPR, whereas the CCPA would still apply!

Technologists are also working to establish technical and governance standards for an Interplanetary Internet, which raises so many further questions:

Would we also need new interplanetary international organizations and interplanetary human rights law? The Application of International Human Rights Instruments in Outer Space Settlements: Today's Science Fiction, Tomorrow's Reality explores these questions.
The huge distances involved require new satellite networking technologies and new network protocols to support Disruption-Tolerant Networking.
The decentralized Interplanetary File System (IPFS) brings with it new privacy considerations. All files are public (although can be encrypted), and metadata is also public. IPFS has been deliberately designed to be modular and flexible, so a 'privacy layer' could be added on top, but is not present by default. Which you could argue violates the principles of privacy by design and default...

My OpenLearn Create Profile

About this course

Introduction to Privacy Engineering

Future Privacy Challenges

Virtual and Augmented Reality

Generative AI

Space Privacy

Further Reading

<- Back

Next ->