On August 6th, AOL Research released the raw search logs of over 650,000 AOL users from a three month period. The data was released ostensibly for research purposes, but the privacy breach in releasing the data has been the main story. While the data was "anonymized" by replacing usernames with numeric ids, it is still possible to determine the identity of a user by reviewing the terms they searched, as the New York Times demonstrated.
AOL quickly pulled the data, but not before many users had downloaded it and made it available in both the original format as well as an easy to use web interface. It's more than a little bit strange to be viewing other people's search results—some are humourous, some are sad, and not surprisingly, some are quite disturbing.
Needless to say, the fallout has been significant. AOL apologized, two of the researches were fired and the CTO of AOL stepped down last week.
In the aftermath of this release, more people are realizing the risks involved in the massive amounts of data they reveal about themselves when they perform seemingly anonymous activities such as searching and browsing the Internet. Google logs your searches and your browsing history, as well as your email, your instant messaging and your photos. Google's even made rumblings that their long term goal is to store all your data. Combine this with government requests for access to search information and it's not hard to understand the concerns of privacy advocates when they speak of the risks involved in collecting this massive amount of information in one spot.
John Battelle, in his book titled "The Search" coins the term "Database of Intentions" to describe these risks:
"The Database of Intentions is simply this: the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result... Taken together this information represents a real-time history of post-Web culture—a massive clickstream database of desires, needs, wants, and preferences that can be discovered, subpoenaed, archived, tracked and exploited for all sorts of ends."
So what can be done, as users, to reduce our footprints in this massive database?
There are a number of tools that can be used, including the GoogleAnon bookmarklet to anonymize your Google cookie, as well TrackMeNot, a Firefox extension that periodically issues random queries with fake data in the background.
Ultimately it's important to remember that the information we enter into search boxes is not anonymous, and not only is it being logged, it is also being transmitted unencrypted across the Internet.