
Jason Hong picks up his new Android smartphone. He can鈥檛 resist. Never mind that he鈥檚 a respected professor of computer science. He鈥檚 a kid when it comes to gadgets and games. He鈥檚 taking a quick break. Perfect time to pull up that new batch of apps he downloaded. Hong clicks open the blackjack app. As he flips through a few hands, his mind wanders to some recent privacy research he read. He hops back to check the app鈥檚 list of permissions鈥攚hat he allowed the software to access when he installed it. 鈥淭hat鈥檚 weird,鈥 he thinks. For some reason, this simple card game is collecting his location information. Even for a tech-savvy academic, the unknowing divulgence of personal information is a surprising and unsettling realization.
That was 2008. Google had entered the app market not long before with its Android app store and sent a few complimentary phones to the researchers at the Carnegie Mellon Human-Computer Interaction Institute (HCII). One landed in Hong鈥檚 office, which was鈥攁nd still is鈥攍ittered with Nerf guns, Legos, toy cars, and his prized games. Fitting for the slight 38-year-old with round glasses who even today looks like one of his students, not a PhD. (His brother still dares him to sit with the students on the first day of class and remark, 鈥淗ey, I hear this guy鈥檚 really tough!鈥) But all that doesn鈥檛 obscure how seriously he and his team take their work and concern for privacy and security.
聽Hong was raised in South Carolina by Taiwanese parents鈥斺渏ust a good ol鈥 Southern boy,鈥 he laughs鈥攁nd showed an early gift for math and science. He joined HCII to continue working on privacy research he鈥檇 begun in graduate school. 鈥淲e knew privacy for these mobile and sensor-based environments was a problem even then, but relatively little work had been done to address it,鈥 he says. Most privacy research had focused purely on the computer itself, such as developing stronger encryption techniques to improve data security. At 麻豆村, though, Hong became fascinated with a 鈥渉uman-centered鈥 approach to the problem, like that of his collaborator Norman Sadeh, director of the Mobile Commerce Lab and another early privacy researcher. Some of Hong鈥檚 first work with Sadeh, for instance, involved the reasons people fall for phishing scams, those fraudulent emails sent to uncover personal information.
鈥淲e are currently entering a third age of computing,鈥 Hong contends. The first stage was the development of computation. The second was the rise of communication鈥攖hink the Web, Wikipedia, and social networking. Now we鈥檙e entering the stage some call ubiquitous computing. Sensors, mobile devices, and embedded computing are so entrenched in our daily lives, from cars to thermostats to lighting, that we鈥檙e scarcely aware.
Is ignorance bliss? Maybe it shouldn鈥檛 be.
鈥淲e鈥檙e entering this third age, and we haven鈥檛 solved the privacy problems of the first two,鈥 he says. 鈥淚t鈥檚 not even clear if we can solve them. And now we have all these new challenges, particularly with these commodity smartphones and the number of sensors they have. Light sensors, accelerometers, cameras. Not to mention what they know about you鈥攚ho you鈥檝e been calling, location data, geotags on your pictures. Privacy may be the most difficult problem for this third age.鈥
The conundrum of the blackjack game on his new phone that day in 2008 鈥済ot the hamsters running,鈥 Hong remembers. Later that afternoon, two graduate students arrived in his office to discuss their ongoing projects, but Hong interrupted with his discovery. 鈥淚 didn鈥檛 know it was this bad,鈥 recalls Shahriyar Amini, now a fifth-year electrical and computer engineering doctoral student. 鈥淚 would never have imagined that a card game would be using my location information.鈥 They sat around Hong鈥檚 round table mulling the disturbing issue. How many apps were doing this? What other information were they collecting? And most importantly鈥攚hy? A new research project was born.
First, they needed an approach. There were countless apps out there, and the numbers were growing daily. Previous app research had used only automated approaches, but these techniques would often flag legitimate uses鈥攕uch as a navigation app using location鈥攁s problems. The team clearly needed human input. But they would have to find a way to gather data so that they could eventually examine public reaction鈥攐n a very large scale and in a workable time frame.
A week later, Amini popped into Hong鈥檚 office. He had a novel idea for the project鈥攃rowdsourcing. It was something they were all familiar with鈥攁 method of using humans to accomplish small tasks that computers can鈥檛 manage, like recognizing images or unusual text. Existing Web sites such as Amazon鈥檚 Mechanical Turk make accessing such a 鈥渃rowd鈥 fairly simple. They could post questions on the site, pay small amounts for answers, and hopefully achieve a much larger and quicker response than with traditional survey methods. Moreover, by posting individual questions and aggregating results, they could avoid a lengthy questionnaire that nobody would want to read, let alone answer. 鈥淥ur plan was to use crowdsourcing to understand people鈥檚 perception regarding the privacy implications of using an app.鈥
The problem was crowd tasks were generally reading text, editing, and the like. Nobody had attempted questioning the public on a technical issue. And even if it could be done, how could they measure something as subjective as 鈥減rivacy risk?鈥 So, they continued to refine their ideas and look for funding, eventually supplied by the National Science Foundation, Google, and the Army Research Office. Two new members joined the team to add a second perspective鈥攗nderstanding users鈥 privacy concerns. They were Norman Sadeh and Jialiu Lin, a computer science doctoral student. Intrigued with the subject, Lin discussed her new research with her classmates. As she mentioned the personal data these apps were collecting, Lin was struck by their reactions. They were all as surprised as her team. It hit her. Why not use this?
As Hong puts it, 鈥淚f people expect an app to do something, like Google Maps using location data, it鈥檚 like informed consent. If people don鈥檛 expect something, like a game using your contact list, there鈥檚 a mismatch. We can measure people鈥檚 level of surprise鈥攖heir expectations鈥攁s one way of measuring privacy risk.鈥
The team began by examining the top 100 apps to learn what information they were accessing. In a tedious, by-hand operation, they determined that 56 of the 100 were using what could be considered sensitive information, including a phone鈥檚 unique device ID, location data, and contact list.
Consider this: There are more than 1 billion smartphones in use across the globe. On average, Americans spend 127 minutes each day on their smartphones, using 41 apps. There are more than 1 million apps available. An app displays its permissions鈥攖he information it will access鈥 the millisecond prior to download. Do you, like everyone else, glance and hit the button?

Here鈥檚 what Hong鈥檚 team determined: While you鈥檙e slinging away those Angry Birds, they鈥檙e collecting your device ID and location. While that lifelike waterfall from Backgrounds HD Wallpapers is lighting up your screen, it鈥檚 gathering your device ID and contact list. And while your Brightest Flashlight app is helping you move through that dark hallway, it鈥檚 grabbing your device ID and location. Often, these apps, particularly the free versions, are sharing this personal data with multiple online marketers, who are likely refining their targeted ads, increasing revenue for everyone. But then, that鈥檚 only conjecture.
鈥淲e don鈥檛 fully know what they鈥檙e trying to do,鈥 admits Hong. 鈥漌e know data is being sent to these companies but don鈥檛 know 100% what鈥檚 going on. We do have lots of guesses. Perhaps they鈥檙e trying to infer what zip code you live in, or where you work, for example, to then infer more demographics about you to tailor their ads.鈥
鈥淩esearch at 麻豆村 has shown how much you can infer just by looking at someone鈥檚 location,鈥 adds Sadeh, 鈥渟uch as what church you attend, what medical conditions you may have, your political affiliation, and more. Now there are close to 130 permissions鈥攁 lot of sensitive functionality鈥攁vailable to developers. It essentially opens the door for abuse.鈥
The research team next posted their questions to an online crowdsourcing Web site and within two weeks, had results. Good ones. Lin brought them to Hong and Sadeh at their biweekly meeting. Hong was surprised鈥攖he results exceeded the team鈥檚 expectations. The crowd was not only able to answer questions, they could provide valuable answers to academic research.
As for how the crowd answered: They were surprised and uncomfortable with every app that collected personal information without obvious reason. The apps that engendered the most surprise were: Brightest Flashlight (ID, location); Toss It game (ID, location); Angry Birds game (ID, location); Talking Tom聽virtual pet (ID); HD Wallpapers (ID, contacts); Dictionary.com (ID, location); Mouse Trap game (ID); Horoscope (ID, location); Shazam聽music (ID, location); and Pandora Internet Radio (ID, contacts).
And as shocked as people were with one app tracking them, they were probably unaware of a potentially bigger problem. Advertisers have made things so simple for app developers鈥攋ust download a package and collect your share of the revenues鈥攖hat developers often sign up and share your information with many.
Unfortunately, there鈥檚 more to be concerned about. With a few major advertising networks controlling the majority of the market, most people have multiple apps collecting and sending their information to the same few entities. The data can potentially be aggregated, allowing for an uncomfortably clear picture of a user. 鈥淭hey can actually combine all the information they gather to form a kind of life history of your cell phone, inferring where you live, your workplace, where your children go to school,鈥 says Lin.
鈥淲hat鈥檚 more concerning,鈥 adds Sadeh, 鈥渋s when you start putting together information across populations of people and mining this data, you can also identify, for instance, social relationships and much more.鈥
鈥淎nd the problem is only growing,鈥 notes Amini. 鈥淭here are so many apps, more and more users. The market is growing so quickly. Every six months we get a new figure.鈥
The team is exploring potential solutions. Amini is developing software that automatically scans apps to determine the information accessed and quickly post results to the crowd for reaction. Hong notes the potential of a Web site that could give users, in real time, a simple app privacy rating鈥攁nd, unlike permissions, before they鈥檙e about to hit the download button. Lin mentions the hope of using machine-learning approaches to discover patterns that could be generalized to the entire app market. And Sadeh notes the potential of clustering users and their differing preferences.
Hong is also concerned with a more pervasive consequence. He鈥檇 like to not only protect people鈥檚 privacy, but also to preserve their comfort with technology. 鈥淚f people are really worried about these things,鈥 he says, 鈥渋t could blunt adoption of very promising kinds of technologies that could really benefit all of humanity in so many ways.鈥
鈥淚n the long term,鈥 he adds, 鈥渢his problem will require, first, some legislative action, like limitations on what data is collected and for what purpose. Second, we need to raise public awareness. Third, we want to help developers better understand what they鈥檙e doing and how to make the right choice. And finally, we want to help the end user by providing information to make better choices. It鈥檚 going to take a combination of at least these things together to solve this problem.鈥 Hong pauses. 鈥淲ell, probably more like manage.鈥
Recently, Hong presented the team鈥檚 research to a series of West Coast companies, for even as the Silicon Valley giants gather ad revenue, they鈥檙e concerned with the public鈥檚 comfort level and this new method of measuring it. He noticed that as he reached the privacy portion, members of the audience began playing with their phones. After one such talk, a woman approached him. 鈥淚 uninstalled those apps while you were speaking,鈥 she said. 鈥淎nd so did a lot of others.鈥
Melissa Silmore (TPR鈥85) is a Pittsburgh-based freelance writer and a regular contributor to this magazine.