Structured Transparency

View

When we share data with others, we may put the individuals described by that data at risk of privacy harms. Yet sharing data can be tremendously beneficial: sharing healthcare data for medical research could enable new, more effective treatments to be developed, for example. As we learned in Contextual Integrity and the Privacy-Transparency Tradeoff, privacy isn't about preventing data sharing, but rather about ensuring information flows are appropriate. Society needs information to flow in order to function: if we have insufficient information, we are faced with constant transparency dilemmas in our decision-making.

How can we best balance this tradeoff between privacy and transparency? What if we had a technical solution that would enable us to share with others the minimum amount of information necessary to be useful while remaining in full control of how that data is used - forever? Sounds like a fairytale, right? Not completely! Privacy Enhancing Technologies, or PETs, provide a toolkit that can help us move closer to making this fairytale a reality.


Structured Transparency

Technical input privacy techniques could allow banks to generate credit scores without needing to centralize massive amounts of personal customer data. Technical input verification techniques could enable the sourcing of information directly from consumers — alleviating the incentive for a back-channel private data marketplace and reducing the opportunity for intentional or unintentional tampering with said data (modifications would invalidate the signature). Perhaps most importantly, output verification techniques could allow external auditors to evaluate the fairness and equity of credit scoring algorithms without ever making the algorithm itself public.
- Beyond Privacy Trade-offs with Structured Transparency, Trask et al (2020)


You can reason about privacy-transparency tradeoffs for a particular information flow using the structured transparency framework, which identifies five privacy properties for an information flow. Depending on your use case, they may not all be necessary. For example, if the same person is both providing input and receiving the output, ensuring output privacy is unnecessary because they know what input they provided. You may also need to make trade-offs appropriate to the specific context, as enabling input or output verification may come at the cost of some input or output privacy.

  • Input Privacy - how can we ask someone to compute on data for us without them learning what the data is? For example, you might want to send sensitive data to a cloud provider for classification using their proprietary machine learning model, but you're concerned about the risk of a data leak at the cloud provider. This could be an ideal use case for a secure enclave or homomorphic encryption. Alternatively, you might just want to send data without it being intercepted in transit: end-to-end encryption is perfect for this. Other ways to implement input privacy include secure multi-party computation and federated learning.
  • Output Privacy - how can we ensure that the recipient of an output, such as a redacted dataset or set of statistics, can't infer the original input data? Differential privacy enables us to provide numerical limits on the amount of information they can infer, known as the 'privacy budget'. An example use case is business analytics: you want to enable your analysts to ask high-level questions about your customer's purchases (e.g. 'what were the top 5 most popular products in London last month?') without allowing them to learn the details of what a specific customer purchased.
  • Input Verification - this allows you to verify that you are the sender (for example by digitally signing a document), or conversely to verify that the data you received came from the expected sender. This can be achieved using public-key infrastructure and cryptographic signatures, or (less commonly) using zero-knowledge proofs or encrypted computation with active security.
  • Output Verification - this is more challenging than input verification and approaches are still being researched and developed. An example would be auditing the outputs of a machine learning model for accuracy, fairness, and lack of bias. How can we do this without forcing a company to reveal their confidential IP, such as the training data and model design?
  • Flow Governance - this is satisfied if everyone who has a stake in how the information should be used can ensure that the privacy guarantees and properties of the data flow are preserved. It should not be possible to modify these without the consent of all parties. You can picture flow governance as a safe with a lock for each party with a stake in the information flow: everyone's key is needed to open it to make any changes.

Considering information flow privacy in this way helps you communicate privacy guarantees at a high level to your stakeholders and avoid over-focusing on a particular PET. Instead of starting with the idea "we need to run this in a secure enclave", the framework allows you to identify that actually what you need is to ensure input privacy, and using a secure enclave is just one possible way of implementing that.

⚠️ It's important to remember that PETs aren't a silver bullet for ethical and legal concerns. It's vital to also consider whether the data flow itself is ethical (is it violating contextual integrity?) and ensure that key privacy principles such as data minimization and purpose limitation are followed. Palantir provide some excellent guidance cautioning against overreliance on PETs at the cost of these privacy fundamentals.



Further Reading