As it turns out, Google’s version of differential privacy may be more private than Apple’s implementation. Writing for Wired, Andy Greenberg talks about differential privacy. Specifically, about a study [PDF] that examines how Apple uses differential privacy in macOS and iOS. The researchers found that it might not be as private as Apple would have us believe.
Differential Privacy
Differential privacy is a relatively new field in data science. It involves inserting random noise into a dataset, such as an iPhone user’s personal information. After the noise is added, the data is uploaded to Apple’s servers. This way, Apple can have its cake and eat it too. It can perform machine learning and collect analytics while maintaining your privacy. Thanks to the noise, the data can’t be matched to your user ID. Or can it?
Differential Privacy Loss
A group of researchers say they have reverse engineered Apple’s differential privacy. They examined to code to find out how the software inserts the random noise. The effectiveness of it is measured by a variable called the “privacy loss parameter” or “epsilon.” The epsilon determines how private your data is kept.
In short, the higher the epsilon value, the less private the data is. The researchers found that macOS uploads more data than what is generally considered “private.” And iOS 10 uploads more data that is more specific and less private.
Epsilons
Apple points out that its data collection is opt-in, although it nudges users to opt-in during a device setup. And the company also says that it adds different levels of noise depending on the data. For example, emoji usage doesn’t need to be as secret as browsing history or health data.
The study found that macOS differential privacy has an epsilon value of 6, while iOS 10 had an epsilon value of 14. A beta version of iOS 11 (version unknown) even had an epsilon value of 43, although that might change once the final version is released. Frank McSherry, co-inventor of differential privacy, explains what a value of 14 means:
Say someone has told their phone’s health app they have a one-in-a-million medical condition, and their phone uploads that data to the phone’s creator on a daily basis, using differential privacy with an epsilon of 14. After one upload obfuscated with an injection of random data, the company’s data analysts would be able to figure out with 50 percent certainty whether the person had the condition. After two days of uploads, the analysts would know about that medical condition with virtually 100 percent certainty.
Google’s Version
Google uses its own version of differential privacy called Randomized Aggregatable Privacy-Preserving Ordinal Response (RAPPOR). Google’s analysis [PDF] claims to maintain an epsilon value of 2 for particular data that is uploaded. The upper limit is epsilon 8-9 over the lifetime of the user.
In theory, this is better than Apple’s differential privacy. Additionally, Google made RAPPOR open source. In contrast, Apple keeps its code and epsilon values secret. If Google changed the code or accepted epsilon values, researchers would know about it. Meanwhile, the team had to reverse-engineer Apple’s code over six months.
Sometimes You Don’t Deserve a Trophy
Now here’s where I disagree. Frank McSherry argues that Apple shouldn’t be judged too harshly, saying “It’s a bit like agreeing to the Paris Climate Accords and then realizing you’re a megapolluter and way over your limits. It’s still an interesting and probably good first step.”
We should give credit where credit is due. If a company like Apple does something good, we should pat it on the back. But that shouldn’t mean kid glove treatment. As I wrote back in June, Apple is pushing privacy as a feature. Tim Cook is quick to criticize Google and Facebook for their data mining efforts:
“They’re gobbling up everything they can learn about you and trying to monetise it. We think that’s wrong. And it’s not the kind of company that Apple wants to be.”
I think that if Apple really wants to be a privacy-forward company, they should be more transparent about its technology. In this case, differential privacy. Apple famously let its machining learning researchers start publishing. Now it’s time for the security team to do the same.
Apple, if you’re going to sell privacy as a feature, you can’t be ambivalent about it. There’s no such thing as being open about certain things while being secret about the rest. Open-sourcing security code like differential privacy and encryption would go a long way towards living up to your stance. As companies get hacked left and right, it’s time to reassure users as well as security experts that you really do have our backs.
Andrew:
An excellent review of differential privacy, and nicely articulated arguments on your part. My main question is, as with any technology, whether or not the theoretical advantage of Google’s RAPPOR and its upper epsilon of 8-9 is practically advantageous. A superior single point performance indicator does not necessarily translate into a usable or practical advantage. I know too little about this field to independently assess, but my experience with single indicator specs in my field raises that flag.
In the end, the real indicator will be performance. To your point that, if Apple are going to sell privacy as a feature, they cannot be ambivalent about it, I would rephrase; if Apple are going to sell privacy as a feature, and a competitor adopts an alternative technology, then Apple should explain their choice, specifically how and why it is not inferior to that of their competition. Privacy is an important part of the value proposition, and therefore consumer investment, in Apple tech, and therefore the company has an obligation to explain its choices to the consumer.