OLCreate: PUB_6214_1.0: Personal Data

Personal Data

View

Personal data, also referred to as PII (Personally Identifiable Information), can have a surprisingly broad definition, depending on the legal jurisdiction. There are some obvious examples of personal data such as an email address, name, ID number or photo, and social security number. However, the tendency over the last few decades has been for regulators to create broader and broader definitions. As you'll see in the extract below from the California Privacy Rights Act (CPRA), essentially any information that could 'reasonably' be associated with or linked to an individual or household should be considered personal.

In your work, you should use this broad definition rather than focusing on specific scope differences between jurisdictions. Is an IP address personal data or not, for example? It depends which regulator you ask, but the best option - especially if you expect to sell your product in multiple countries - is to consider it to be personal data. Over 150 countries now have data protection laws - instead of trying to know them all in detail (a huge task!), treat all data with care and if you have any doubt as to whether it could be identifying, assume that it is.

Exemptions are often made for:

Commercial contact details, such as a company's address or a corporate email address
Public information available in official records
Household data processing (e.g. taking photos of your friends and family for your personal photo album)
Research
National security
And other use cases for public safety or in someone's best interest when they are incapacitated, for example checking someone's medical data to provide them with life-saving treatment.

In such cases, data may be considered non-personal data or be exempt from data protection law. However, these exemptions vary by jurisdiction and are often tightly scoped, so be sure to check any applicable laws before relying on one of them. Keep in mind that in addition to data protection law, other national or sectoral law might regulate what you can and can't do with personal data. For example, you might be required to keep records of individuals' financial transactions for a specific length of time.

📚 Reading: CPRA definition of personal data

(1) "Personal information" means information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. Personal information includes, but is not limited to, the following if it identifies, relates to, describes, is reasonably capable of being associated with, or could be reasonably linked, directly or indirectly, with a particular consumer or household:

(A) Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers.

(B) Any personal information described in subdivision (e) of Section 1798.80.

(C) Characteristics of protected classifications under California or federal law.

(D) Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.

(E) Biometric information.

(F) Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an internet website application, or advertisement.

(G) Geolocation data.

(H) Audio, electronic, visual, thermal, olfactory, or similar information.

(I) Professional or employment-related information.

(J) Education information, defined as information that is not publicly available personally identifiable information as defined in the Family Educational Rights and Privacy Act (20 U.S.C. Sec. 1232g; 34 C.F.R. Part 99).

(K) Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer’s preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.

(L) Sensitive personal information.

(2) "Personal information" does not include publicly available information or lawfully obtained, truthful information that is a matter of public concern. For purposes of this paragraph, "publicly available" means: information that is lawfully made available from federal, state, or local government records, or information that a business has a reasonable basis to believe is lawfully made available to the general public by the consumer or from widely distributed media, or by the consumer; or information made available by a person to whom the consumer has disclosed the information if the consumer has not restricted the information to a specific audience. "Publicly available" does not mean biometric information collected by a business about a consumer without the consumer’s knowledge.

(3) "Personal information" does not include consumer information that is deidentified or aggregate consumer information.

Why Is The Definition So Broad?

One reason is that it's surprisingly easy to uniquely identify someone with just a few data attributes. For example, research in 2000 identified that 87% of the US population could be uniquely identified from the combination of their 5-digit ZIP (postal) code, gender, and date of birth, while 18% of the population could be uniquely identified from just their county, gender, and date of birth.

Similarly, you can infer highly sensitive information about a person - that may have legal consequences for them or cause them to be discriminated against - from just a few data attributes. Large-scale research on Facebook likes, for example, has found that seemingly innocuous likes for interests such as "Britney Spears" or "Desperate Housewives" were both found to be moderately indicative of homosexuality in males. While a single like by itself may be an imperfect predictor, by combining many of these likes together these attributes can be inferred with a high degree of confidence. Perhaps there's only a 55% chance you are an "X" (some characteristic) if you like page A. But if we know the same is true for page B and C, and you liked all 3, it begins to look more and more likely. (Note: the same thinking was behind the Cambridge Analytica scandal we saw earlier in the course, where Facebook likes data was used for psychological profiling for micro-targeted election campaigns.) See the Further Reading section to dive into both of these examples further.

Sensitive Categories of Personal Data

Regulators have also thought about the potential harm individuals could come to if particularly sensitive data about them is exposed. Certain categories of personal data are singled out as special or sensitive in data protection law. These include:

Racial or ethnic origin, caste, or tribe
Citizenship or immigration status
Religious or philosophical beliefs/opinions
Trade union membership
Genetic data
Biometric data, including photos of faces (when processed to uniquely identify people)
Criminal records
Health data
Sexual orientation or preferences
Children's personal data
Geolocation
Social Security / health insurance number or national / state identification number
The contents of an individual's mail, email, or text messages, unless your company is the intended recipient
Login details (password, PIN, credentials...)
Financial account number or debit/credit card number; some laws include all financial data

In general, you should avoid processing data from these categories whenever possible. If it is necessary, get legal advice before you start implementing, as these categories require extra care. The CPRA, for example, allows you to process this data but requires you to provide an easy way for the user to opt out of their sensitive information being processed. Keep this in mind when designing your systems - how will you make sure this is possible? Will it still be feasible for them to use your product afterward? Clearly, it will not be if they want to delete their account login. However, a user objecting to their geolocation data being processed for a games app should still be able to play the game. In contrast, the GDPR completely prohibits processing of sensitive data unless you have the explicit consent of the user or the processing falls under one of the other exemptions (for example, you need to process trade union memberships because you are a trade union!). Note that only some of the categories above are included in the GDPR's definition of sensitive data ("special categories").

My OpenLearn Create Profile

About this course

Introduction to Privacy Engineering

Personal Data

Why Is The Definition So Broad?

Sensitive Categories of Personal Data

Further Reading

<- Back

Next ->