Spotlight is bloated, slow as sin, and has a UI designed by a committee that seems to spend its time either spitting on the old Mac User Interface (UI) Guidelines (and UI studies) or designing most of its new UI themes by hiring monkeys to throw UI feces at the screen to see what sticks. Oh yea, and as an added bonus, it now tracks the comings-and-goings of files as if it were just waiting for the chance to rat you out like Rooster the pimp. Mac users may already be experiencing the inadvertent metadata disclosures that have some analysts concerned about Microsoft's vaporous Vista.
Spotlight the File System Snitch
For example, when you download files from Safari or transfer them via iChat, and you select the file and "Get Info" from the File menu in the Finder, you will see a "Where from" entry that shows the file source.
As you can see, the Get Info panel shows not only the mode of the transfer, i.e., iChat, but it also shows the full name and contact information of the person that provided the file. When you download files via Safari, the source Web address is stored. If I were a content/media company executive, I know I would ask Apple "pretty please with sugar on top" for such file tracking.
Most times this is no big deal and actually is quite useful, but as more and more metadata gets saved regarding your files, you should be aware this is happening. I'd wager that most users had no idea this kind of file source tagging was taking place on a file system level. And not knowing can lead to all kinds of security gaffs.
Security Reasons for Metadata Stripping
For instance, I often have to strip metadata before sending files to clients or adverse parties. In the past, I have seen Microsoft Word files with edits and other comments that caused difficulties for the sender. Word files keep all kinds of information hidden, i.e., edits/revisions, the names of people who worked on the files, comments, etc. This information can be a great liability if unintended parties gain access to it.
For example, sometimes comments and collaborative edits are in a file and are intended just for team members. Sending a contract with a hidden comment stating "$X million is the highest price we will pay, but let's try to low ball them first" is a pretty good way of spoiling your negotiating position (unless you're being canny and putting it in there on purpose). Anyway, many corporations use tools to strip much of the metadata from Word files (unfortunately, I don't know of any Mac products that do this).
However, with Vista coming down the pike, and the Mac OS keeping a bunch of different metadata types about, it's getting more difficult to know exactly what information you will be sending to others or what your computer is tracking about your activities. For example, some of you old Mac veterans may think that the "Where from" information is stored in the resource fork of the file and that nuking it with tools like RemoveMacOSJunk.app will remove it just like color label information. Not so. Even with a hex editor, you cannot see the information in the file.
New Spotlight Metadata
Then where is the information stored? "Where from" entries (an much more metadata) are stored in the Spotlight index database. You can access this metadata using the mdls and mdutil command line tools.
Screenshots of the mdls and mdutil commands in the terminal.
(Click the thumbnail for a larger image.)
The good news is that Spotlight information seems to be stuck to your hard drive volume. Generally speaking, this information is useful and welcome; and that it doesn't necessarily track the file is a good thing, security-wise. However, it seems that if you use other volumes, like thumb-drives and USB drives, Spotlight may be populating those volumes with metadata as well. So when you hand that thumb-drive over, you may be handing over more than just files as some metadata may follow.
Two Layers of Metadata: Application and File Level
At any given time there may be at least two layers of metadata present on your system. There is application layer metadata and file level metadata.
The Spotlight database as well as the iTunes and iPhoto databases show that there are some types of metadata that stay with the program and do not travel with the file. For example, the number of times you played a song in iTunes does not travel with the file. So if you blow your iTunes database or have to rebuild it from scratch, you will lose that information. The "Where from" metadata seems to be of this type of metadata.
Then there is the file level metadata. That is metadata that is embedded within the file and travels with the file. You see such things in Microsoft Word in the form of comments and in the Properties dialog box.
Also, you see this behavior with ID3 tag information that travels as part of your MP3 files.
The xattr factor
But there is a gray layer between the two. For example, if you have an MP3 file with no ID3 tag information, in some instances iTunes can generate an ID3 tag and use the filename to embed that information into the ID3 tag information. The information can flow in the other direction where application metadata can be pressed out into the file. For example, iTunes will use the track number and song title from the ID3 tag to name its files. Clearly, what iTunes is doing is very useful, but it doesn't take much imagination to see how metadata can creep from one layer into another in unwanted circumstances.
One such place in Mac OS X may come in the form of the xattr, i.e., extended attributes. The xattr is useful as it helps Spotlight by storing even more types of metadata. For example, it seems that Apple has added attributes to aid in the control of Digital Rights Management and Intellectual Property, e.g., kMDItemRights. SpotMeta lets you play with this a little. With the xattr, metadata can follow files, even to foreign file systems. It's not tough to imagine that a lot of information that Spotlight is storing in its application layer database may be moved to the file level via xattr in the future.
Get Rid of That Not-So-Fresh Feeling
So what do you do if you want to clean your files of metadata before you send those files off? You have a few options. If the information is in the resource fork, you can use RemoveMacOSJunk.app. If the information is part of the file, then you will have to either edit it within the program, get a specific program to strip that type of metadata, or export the file to another format (e.g., saving a Word file as a PDF will strip most metadata). If the information is part of the Spotlight database or xattr data, then you can use mdutil to get at the information. MainMenu.app allows you to delete your entire Spotlight index file on your main volume, but it may not help with all xattr metadata. Basically, it seems that, currently, there is no single and easy solution for scrubbing your files of all the metadata crud that may be attached.
What To Do?
Don't Panic! For right now simply saving your files as PDFs before sending them will suffice for most people. That being said, the management of metadata, who owns it and who controls it, is an area rife with serious security concerns. Most users don't want to know about or deal with metadata management. They want things to just work.
As such, Apple should consider providing users with a framework to control metadata. Software developers can help out through Spotlight plug-ins. By knowing all the metadata tags that are stored inside the developers' files through plug-ins, Apple can use those tags to strip all the metadata from the files. Apple can provide users with a "Clean Metadata" command next to the "Secure Empty Trash" menu in the Finder, which will clean metadata from any selected files and/or folders. Simple. Having Mail.app automatically scrub metadata from any attachments (with an optional override button) would also be a welcome move.
If Apple doesn't want to look more and more like the dorky big brother guy from its 1984 commercial, it better give some power back to the users to help people secure their information.