Originally published on June 5, 2019 with occasional updates to add new information.
Occasionally I like to check up on Apple’s security pages and privacy policies. I noticed something new in the privacy policy, which was last updated May 9, 2019. Under the “How we use your personal information” header, one of the paragraphs now reads (emphasis added):
We may also use your personal information for account and network security purposes, including in order to protect our services for the benefit of all our users, and pre-screening or scanning uploaded content for potentially illegal content, including child sexual exploitation material.
PhotoDNA
Apple may have even been doing this for years, but this is the first time this has appeared in its privacy policy. And I checked earlier versions using the Wayback Machine. I’m going out on a limb and assuming that Apple is using PhotoDNA. This is a technology developed by Microsoft in 2009:
PhotoDNA is primarily used in the prevention of child pornography, and works by computing a unique hash that represents the image. This hash is computed such that it is resistant to alterations in the image, including resizing and minor color alterations. It works by converting the image to black and white, resizing it, breaking it into a grid, and looking at intensity gradients or edges.
Microsoft eventually donated PhotoDNA to the National Center for Missing & Exploited Children (NCMEC). It’s used by companies like Facebook, Twitter, Google, and others. Basically, it works by creating a hash of a photo or video, comparing it to known child pornography hashes in NCMEC’s Child Victim Identification Program database, and seeing if there’s a match.
Now, companies scanning user content is a bit concerning even if it’s used for good in cases like this, especially Apple given its privacy stance. According to Microsoft’s PhotoDNA FAQ, images are instantly converted into a secure hash and can’t be reverse engineered. PhotoDNA is specifically used for child pornography and can’t be used to scan for other content.
One possibility is that Apple’s scanning is done by the photoanalysis daemon. This is the algorithm used to detect people, objects, etc. in your photos, and it’s done locally. That could be implied by the phrase “pre-screening.” But the word “uploaded content” trips me up. Is Apple scanning stuff in iCloud Drive, or just iCloud Photos? “Scanning” is also an issue. I use it because that’s the word Apple uses, but I don’t think comparing hashes is the same as “scanning.”
iCloud Security
On its iCloud security overview page, Apple says that customer data is encrypted in transit and on iCloud servers. This includes both the Photos app and content stored in iCloud Drive. So I’m wondering at which point Apple scans content. Clearly before it gets encrypted.
We know that Apple stores the encryption keys in the cloud as well, according to the company’s Legal Process Guidelines [PDF]. This is how Apple helps you reset your password, and provide law enforcement with data under a subpoena. Further, Apple’s legal terms for iCloud says the following (emphasis added):
C. Removal of Content
You acknowledge that Apple is not responsible or liable in any way for any Content provided by others and has no duty to pre-screen such Content. However, Apple reserves the right at all times to determine whether Content is appropriate and in compliance with this Agreement, and may pre-screen, move, refuse, modify and/or remove Content at any time, without prior notice and in its sole discretion, if such Content is found to be in violation of this Agreement or is otherwise objectionable.
How can Apple determine if content is appropriate or not if it isn’t scanning iCloud content despite it being encrypted? Or, as I mentioned, maybe it’s doing all of this scanning during upload/before encryption. Don’t get me wrong, I have no problem with Apple scanning for child abuse content, but I’d like to know the extent of Apple’s knowledge of general customer content.
The word “appropriate” needs to be defined, because as we saw in 2012, Apple once deleted a screenwriter’s script attached to an email because it had a character viewing a porn ad on his computer:
AND THEN I SAW IT — a line in the script, describing a character viewing an advertisement for a pornographic site on his computer screen. Upon modifying this line, the entire document was delivered with no problem.
I reached out to Apple with questions, and I’ll update this article if I get a response.
Note: Additionally, a crucial detail that a reader points out below is that end-to-end encryption for iCloud content is only turned on if you enable two-factor authentication. But that doesn’t include iCloud Drive or iCloud Photos. End-to-end encryption is different than “encrypted in transit and at rest on the server.”
Update: 2019-10-25
Henry Farid, one of the people who helped develop PhotoDNA, wrote an article for Wired saying:
Recent advances in encryption and hashing mean that technologies like PhotoDNA can operate within a service with end-to-end encryption. Certain types of encryption algorithms, known as partially or fully homomorphic, can perform image hashing on encrypted data. This means that images in encrypted messages can be checked against known harmful material without Facebook or anyone else being able to decrypt the image. This analysis provides no information about an image’s contents, preserving privacy, unless it is a known image of child sexual abuse.
I think homomorphic encryption is still relatively new and I don’t think Apple currently uses it. But I could see Apple using it because you can perform computations on data encrypted with this technique, similar to what Apple can do with differential privacy. I look forward to the iOS 13 security guide to see if I can glean some insight. Or like I mentioned earlier, maybe Apple just scans for this content before it gets uploaded and encrypted.
Update: 2020-01-08
It looks like I may finally have an answer. Speaking at CES 2020, Apple’s chief privacy officer Jane Horvath mentioned photos backed up to iCloud in terms of scanning.
As part of this commitment, Apple uses image matching technology to help find and report child exploitation. Much like spam filters in email, our systems use electronic signatures to find suspected child exploitation.
“Accounts with child exploitation content violate our terms and conditions of service, and any accounts we find with this material will be disabled.
It’s nice to finally hear something official from Apple.
Update: 2020-02-11
A search warrant revealed that Apple scans emails for this content.
Update: 2021-08-09
It’s been a busy weekend. Apple’s original update to its privacy policy was quiet, but now they’ve made a public announcement. This article was an attempt to figure out exactly what Apple was doing and when. Like, how can they scan for this material if iCloud data is encrypted?
Apple scans for hashes of known CSAM during its upload into iCloud, which is what the term “pre-screening” means as shared above. But now they’ve moved this scanning algorithm and hash database locally on every device. It’s a clever middle-ground between privacy and invasiveness.
It’s not PhotoDNA either; instead Apple created its own technology that it calls NeuralHash. There is a technical PDF here, as well as an easy-to-understand PDF here.
Bottom line: Despite the scanning done locally it still only scans photos uploaded to iCloud Photos. Non-iCloud photos will not be affected, nor (I suspect) will photos stored in other places like the Files app.
Andrew, the PDF links do no resolve to the right place.
Btw, really great work on this topic Andrew. Thanks so much, it’s super appreciated!
Are other phone brands/operating systems also do this?
Here’s the list of NCMEC partners: https://www.missingkids.org/supportus/our-corporate-partners. Motorola, Google are phone companies. Apple is not on this list, so it may be incomplete.
Great article Andrew.
but bottom line, Apple runs a process on the phone, without the users knowledge and permission, that scans their content. That they might use it only for iCloud uploaded data is irrelevant. They broke the back door wall and clearly have some process that can and does scan your data on your device.
if it can be used to scan for one type of data it can be used to scan for any type of data. It is a betrayal of the Steve Jobs assurance .
Andrew:
You wrote,
No, it’s being on in situ on the device itself, not in the cloud, and only on images you try to upload into iCloud. Apple state on their FAQ, that if you are not using iCloud, then this feature is not active.
It is not ‘scanning’ in the sense that the AI is reviewing all your photo content prior to uploading to iCloud. Apple punctuates this, stating, ‘This feature does not work on your private iPhone photo library on the device.’ It is screening for a hashtag match to hashtags corresponding to images in the CSAM database. (And yes, this does sound very much like PhotoDNA). Otherwise, Apple already stated in their original announcement that the device is unaware of the actual images stored on it. In the common usage of the term ‘scan’, it is not scanning any images on your phone or other device, including those matching to the CSAM database; merely the hashtags, which thy point out cannot be reverse engineered to an actual image. In short, the actual images on your phone are irrelevant and useless to this screening method. And unless you’re uploading this to iCloud, no one knows what you have on your device – not even the device.
I think you’ve updated most of these points further down, in any case.
Thanks for posting this; it is a very useful PSA and analysis, given the community interest and volume of discussion over the weekend.
“Basically, it works by creating a hash of a photo or video, comparing it to known child pornography hashes in NCMEC’s Child Victim Identification Program database, and seeing if there’s a match.”
Am I reading this correctly that if the image is original then the process wouldn’t find a match?
As far as I understand it, that is correct. It can only scan for images of CSAM that they already know about. The hope is that if they catch someone with known CSAM, maybe they have content that is “unknown” CSAM, and then they can add it to their database.
If one of these people made an original piece of content, it wouldn’t be detected unless/until they were caught by law enforcement and those images were added to the database.
Dear Mr. Orr,
in the following statement:
“Apple says that customer data is encrypted in transit and on iCloud servers. This includes both the Photos app and content stored in iCloud Drive”
you missed one very important detail. The data is encrypted at the rest on iCloud Drive ONLY in case that 2FA is ON.
https://support.apple.com/en-us/HT202303
Aha, thanks!