Data on Internet search queries is a potential gold mine for researchers, as a glimpse into the minds of the online population. But despite efforts to keep that data anonymous, its release is a mine field for personal privacy, as evidenced by AOL's legendary 2006 "screw up."
Now some Microsoft researchers say they've come up with a way to release and study search data without risking privacy. The company is quick to add that it doesn't have any plans to release search data in this way. But if anyone else is brave enough to give it a try (Yahoo? Google?) the approach is detailed in a Microsoft paper accepted for the International World Wide Web Conference in Madrid: PDF, 10 pages.
The trick is an algorithm that produces what the researchers call a "private query click graph" that shows queries and URLs, giving weight to different URLs based on the number of users who clicked on them after making particular queries.
"While this graph is not as powerful as the actual search log, many computations can still be performed on the click graph with results similar to the actual search log, e.g., finding similar queries, keyword generation, and performing spell corrections," the researchers write.
The research paper, nominated as one of the best at the conference, is one of 16 Microsoft Research papers accepted there -- about 15 percent of the total number, and more than any other organization participating.
The work was done by Stanford University student Aleksandra Korolova, while working as a Microsoft Research intern, along with researchers Krishnaram Kenthapadi, Nina Mishra, Alexandros Ntoulas of Microsoft Search Labs in Mountain View, Calif.
They created the algorithm based on what's known as the differential privacy definition. "In a nutshell," they write, "the definition states that upon seeing a published data set an attacker should gain little knowledge about any specific individual."
But for now, at least, the company isn't preparing to implement the approach itself. A Microsoft spokesman says in an email that the company "currently does not have any plans to use the capabilities found through this research in its products and services."
READ MORE and COMMENT, more
No comments:
Post a Comment