‘Differential privacy’

Modern technology has presented us with something of a Faustian bargain when it comes to our privacy, but Apple thinks we should have another option.
The bargain we’ve long been offered is the use of all kinds of personalised services, often for free in exchange for handing over some of the most intimate details of our lives — and running the risk that those details will end up in the wrong hands.
Apple thinks it’s found a way around this dilemma. The company is promising that it can offer the kinds of useful information that comes from collecting and analysing large amounts of data — without compromising individual users’ privacy — by using a set of techniques that goes under the banner of “differential privacy.” If it works the way Apple and its developers say it will, the system could help rebalance the data collection-personal privacy equation.
“We believe you should have great features and great privacy,” Craig Federighi, Apple’s senior vice president for software engineering, told the audience at the company’s Worldwide Developer Conference in San Francisco earlier this month. “You deserve it and we’re dedicated to providing it.”
Privacy is not a new point of emphasis for Apple, as the recent brouhaha over iPhone used by one of the San Bernardino, California, attackers made clear. But in the next version of iOS, due out this fall, the company plans to add in differential privacy as an additional privacy protection tool.
Long studied in academic circles but only deployed in a handful of commercial applications, the system allows people or companies to glean meaningful information from a set of data while at the same time preventing any particular data point from being connected to individual users.
The way the system does that is by introducing random bits of noise into the data in known amounts. The classic example is a survey in which users are asked a potentially embarrassing question, such as whether they’ve ever used drugs. Respondents would be instructed to flip a coin without telling the survey givers how it lands. If the coin comes up heads, they are instructed to answer truthfully. If it comes up tails, they are instructed to flip again. On the second flip, if the coin comes up heads, they are asked to answer yes, if it comes up tails, they are asked to answer no — regardless of what’s actually true.
Such a system protects individual privacy, because no-one can tell whether any individual is answering truthfully or not. But because the chance of a coin landing heads or tails is a known quantity, researchers can figure out overall the proportion of users that are answering truthfully and filter out the random noise. With a large enough data set, they can get a pretty good sense of what proportion of the population has used drugs without being able to tell if any individual has.
Of course, iPhone users won’t literally be flipping coins or answering survey questions. But Apple plans to employ related techniques. For example, the company plans to interject
noise into user data on their devices, before any data is
sent up to Apple’s servers to analyse.
One way it will do that is by “subsampling” user data; instead of collecting every word a user types, for instance, it will collect a subset of them. It also plans to randomly change data. So characters in words you type may be altered so that the words would be unrecognisable.
Differential privacy theorists have recognised that the more data collected on a particular person, the more individualised that data becomes and the more likely it is that the person could be identified. To try to protect against that danger, the system requires a so-called privacy budget, a limit to the amount of data that can be collected on any one person. Apple is including just such a budget, capping the amount of data it collects on customers in a set time period.
Aaron Roth, who co-wrote a book on differential privacy, got a sneak peak at how Apple was implementing its system and came away impressed.
“They employ engineers who understand the mathematics, and they’ve got a good algorithm,” Roth, an associate professor of computer science at the University of Pennsylvania, said.
At least at first, Apple will only use its differential privacy algorithms in a limited way. For example, it will use them to try to identify new words that customers are typing on their devices so that it can add them to the dictionaries of all users. Similarly, it plans to use them to determine the most popular emoji that customers are using in place of words to better figure out which ones to suggest. It also has indicated that it hopes to use differential privacy for other features in the future.
As interesting and promising as differential privacy is, it’s not a panacea. It works well with large data sets from large numbers of users and can be used well to determine trends among groups of people. But it fails in small sets of data; either there’s too much noise to glean anything useful or there’s so little that individuals can be identified.
Security officials might be able to use differential privacy to find patterns of behaviour common to terrorists without identifying individuals, but they couldn’t use such techniques to zero in on individual suspects. Likewise, Apple likely wouldn’t be able to use it to make particularly personalised suggestions for users.
And it remains to be seen how the system will work and how well it will protect users’ privacy. So far, Apple hasn’t said much about the algorithms it will use or the parameters it will set.
“I’m happy … that Apple has announced this and seems to be working on it in a way that has a lot of potential,” said Nate Cardozo, a senior staff attorney with the Electronic Frontier Foundation, a digital rights advocacy group. “Apple has said the right words at this point, and now we’re waiting for the details.”
But it’s an exciting development. Here’s hoping for all of our sakes that Apple gets it right. — The Mercury News/TNS

* Troy Wolverton is a technology columnist for The Mercury News. Reach him at [email protected] or follow him on Twitter @troywolv.

Community / Culture

‘Differential privacy’