Harvard students built a tool that looks through consumer datasets from data breaches for a class paper they’re working on. As a result, they discovered that data that companies claim to “anonymize” may actually not be so anonymous after all (via Vice).
Anonymized Data
The datasets they use contain personally identifiable information (PII) from data breaches. This can be information like email addresses and usernames, but even addresses and phone numbers. It depends on the data breach. One such breach was from Experian, which leaked the PII of six million Americans.
The students, Dasha Metropolitansky and Kian Attari, wondered if they could identify an individual across other data breaches. Analyzing thousands of datasets, many of them containing “anonymized” data, they found that identifying individuals wasn’t difficult.
We showed that an ‘anonymized’ dataset from one place can easily be linked to a non-anonymized dataset from somewhere else via a column that appears in both datasets. So we shouldn’t assume that our personal information is safe just because a company claims to limit how much they collect and store.
The question on my mind is if their tool could identify people from data that Apple runs through its differential privacy process.
Further Reading
[Apple Releases Details on Differential Privacy…]