In Computers We Trust?
The growing need for cybersecurity leads scientists to develop systems that better protect us online
Oct. 12, 2009
WARNING: Reading this story may make you want to move to a small cabin in the woods, hide your cash under the mattress, destroy your computer, cut up your credit cards and throw your cell phone in the garbage.
Script-kiddies. Bots. Trojans. Zombies. Typosquatting. Hackers. Botmasters. This is the language of the criminals of the 21st century who are poaching the vast web of our computer networks to make money, wreak havoc, stalk and terrorize. Many of us have fallen prey to their advances. Most of us have had our computers infected by a virus.
That’s because many aspects of our daily lives are completely dependent on the Internet (which, by the way, celebrates its 40th anniversary this year).
We bank on the Internet. We communicate there with friends, family and coworkers. Our electrical grid uses it. The government sends classified and unclassified information on it. Hospitals and emergency response teams need it. You might be reading this article on it. And, hundreds and thousands of critical systems that govern our daily lives depend on it.
But all this connectivity and convenience has come with a price. The bad guys are having a field day, engaging in malicious activities from identity theft to cyberwarfare.
“It’s big business,” says Fred Chang, a research professor in the Department of Computer Science and director of the university’s Center for Information Assurance and Security (CIAS), “and the bad guys are getting better at what they do every day.”
Cybercrime costs an estimated $1 trillion globally. In 2006 alone, U.S. adults lost $49.3 million to identity theft. Estimates from 2008 show that each U.S. adult had a 66 percent chance of experiencing at least one exposure of his or her data. Some critical Web sites in the country of Georgia were shut down recently by Russian cyberattackers just prior to a physical invasion by Russian forces.
For these reasons and more, President Barack Obama has made cybersecurity one of his national priorities.
“This cyber threat is one of the most serious economic and national security challenges we face as a nation,” Obama said in a speech in May 2009. “Protecting this infrastructure will be a national security priority. We will ensure that these networks are secure, trustworthy and resilient. We will deter, prevent, detect and defend against attacks and recover quickly from any disruptions or damage.”
On the top of Obama’s list has been to create a new office at the White House led by a cybersecurity coordinator, a recommendation that comes directly from a report by the Commission on Cybersecurity for the 44th Presidency, on which Chang has been an integral member for the past two years.
Another recommendation from the report, also supported by Obama and a top priority for Chang, is increasing the federal investment in cybersecurity research and development.
Chang, who recently published a paper titled “Is Your Computer Secure?” in Science, says federal and private investment in cybersecurity is critical for moving forward the basic research that will lead to better built-in security for computing systems. Making something secure today is mostly an afterthought. (For example, we have to buy antivirus software to help us deal with viruses.)
“The goal for cybersecurity,” he says, “is to build systems from scratch that are fundamentally secure. We want to be able to trust the system right out of the box because it was designed to be secure. We have world-class capability here at The University of Texas at Austin that is foundational in this regard. But there is work to do.”
One of those capabilities on campus comes in the form of Vitaly Shmatikov, a young, fast-talking associate professor of computer science who studies privacy in ubiquitous data sharing systems, from Facebook to hospitals to Netflix.
With so much information being shared online these days, it’s critical that much of it remains private and anonymous. We trust, for example, that social networking sites such as Facebook remove personally identifiable information when they share our preferences and desires with advertisers.
“When companies and organizations that have your data say they protect your privacy, what do they actually mean?” Shmatikov asks. “They normally think they can share the data if they just anonymize it by removing the names and any identifying information, such as Social Security numbers and email addresses, and that there is no privacy issue.”
Shmatikov is finding that simply doing that doesn’t quite do the trick.
“What we’ve managed to show is that even if names are removed, any attackers with access to a little bit of public information can often reattach the names,” he says.
For example, he and graduate student Arvind Narayanan recently reported their ability to re-identify anonymized data produced by social networking sites Twitter and Flickr that they sell to advertisers.
The scientists developed an algorithm that looked at the overlapping information among users of the different sites and analyzed the structure of the individuals’ “network neighborhood.”
In one third of the cases, they identified people from what were completely anonymous data.
“In practice, it’s very easy to re-attach a name to anonymous data,” says Shmatikov.
He was also able to find the identities of Netflix users from anonymized data released by the company.
“First off, if you know just a few bits of information about some Netflix subscriber, just a few movies and whether they liked them or not, you can with very high confidence find that subscriber’s record in the dataset,” he says.
Using some further mathematical reasoning, statistics and algorithms, Shmatikov re-attached names to the users by connecting their movie preferences and reviews with other breadcrumbs of information they left publicly on blogs and the IMDB (an online movie database).
People might not use their real names on such sites, but they do often use their email address or list their city of residence.
“Then you can just Google search the email address, find their Amazon reviews with their real name, and you’ve made the link,” says Shmatikov. “That’s not difficult.”
He says many people could abuse this lack of protection. Unscrupulous companies could use the information to spam people. Oppressive governments could use the information to monitor people. Cyberstalking–from a mild version where helicopter parents monitor their kids to jilted exes stalking former partners–could also be an issue.
By breaking such systems apart, Shmatikov’s research group is trying to show people that privacy protections should not be ad-hoc. Like Chang, he believes protections need to be built into the systems from the start.
“The ultimate vision is that we’re going to give you some software building blocks, and we’re going to prove to you mathematically that if you build your program using these building blocks,” he says, “then you will be safe against a certain class of privacy attacks.”
But he’s quick to caution: “There are always other avenues of attack. We can’t protect from everything with the software we would like to design.”
A relatively new arm of research for Shmatikov regards looking at genetic data. It’s research he’s embarking on with colleague Emmett Witchel.
“We are well on our way to having all of our genomes sequenced very cheaply,” Shmatikov says. “More and more medical and law enforcement databases are going to come with genetic data, and that information is extremely sensitive.”
In particular, doctors and researchers will be sharing this genetic information so they can search for better understandings of disease and develop new treatments. Data computations of this sort will be on a massive scale and will be extremely powerful in the world of health care.
Witchel and Shmatikov are working to adapt a program called MapReduce, developed by Google for use on such large-scale computations, so that it provides privacy guarantees to people whose genetic data are being used but still keeps the process efficient and high performance.
“As time goes on, we will see more databases with so-called anonymized genetic information being used for medical research,” says Shmatikov. “I think the next big privacy break is going to be in that area.”
They hope to develop tools and building blocks that can be used for these massive epidemiological and genome studies.
“We’ll be able to tell them that they can run computations on sensitive data, but they can be assured that they won’t learn anything about individual identities,” he says. (The good guys don’t need to know all those names either.)
And again, with just a bit more information–like a casual conversation about a person’s ancestry or hair color–cyber attackers could pretty easily re-attach a name to particular genetic sequences if they can get their hands on them. Shmatikov is reluctant to even predict what people might do maliciously with such data.
As information such as passwords and account numbers moves from computer to computer across the Internet, it is encrypted–jumbled into a non-comprehensible form.
Encryption occurs when two parties–your personal computer and Amazon.com, for example–connect and exchange a “key.” Your browser then encrypts the information you want to send (like a credit card number), it flows in a jumble to Amazon, and then Amazon decrypts it based on the key that the two computers exchanged.
You know your information is encrypted when you see the small lock icon on your Web browser or the “https” appears in a Web address.
Unfortunately, attackers can intercept this encrypted data as they flow through the system and use it maliciously. They can also access encrypted data from where it sits in storage, whether it’s on a server you maintain or on one that lives elsewhere (for example, in a huge Google server-farm). The latter, dubbed “cloud computing,” is becoming more and more common and will require changes in the way that data are protected.
“Many of the assumptions of the past are changing rapidly in the face of new systems such as cloud computing,” says Assistant Professor of Computer Science Brent Waters.
Computing in the cloud means users are connecting to and storing information on servers outside of their immediate control. For example, people can rent server space from Amazon.com. Recently, the City of Los Angeles made headlines as it consider outsourcing server resources to the cloud (in that case, to Google.com). Facebook, too, is on the cloud.
In the cloud, encrypted data sits on a server right next to space that others have access to.
“When I encrypt all the information on my hard drive and then my machine gets stolen, that’s one kind of problem,” says Waters. “With cloud storage, this information will be stored on third party servers, and encryption is going to be very important for that.”
Waters is working toward a new paradigm that he and his collaborators call “functional encryption.” Traditional encryption involves knowing the identity of every individual that is allowed to see certain data. For example, Brent Waters may have access to items A and B, while Vitaly Shmatikov has access to B and C.
Functional encryption, says Waters, will work by giving people with certain attributes–not just their identity–the ability to decrypt information.
For example, city police share a lot of sensitive information about cases within their organization and with other law enforcement agencies. At present, people are given access to this information based on their identity, not necessarily the role they play in an investigation.
Functional encryption could help the organization implement an overall policy (for example, ‘only undercover detectives can see this information’) and as the agents join a case (or leave it) they are automatically given access to certain privileged information. The key used to unlock the encrypted record would be based on multiple and dynamic criteria, rather than just a static name or ID number.
“We need to re-envision what we really mean by encryption,” says Waters.
Waters also recently worked with Witchel on a project related to self-destructing data. There is no way to permanently delete any material posted or sent through the Internet, and this leaves people’s information vulnerable to breaches in privacy. No paper shredder or lighted match exists to erase digital data out of history.
But researchers from the University of Washington built a program, called “Vanish,” that promised to make tagged computer data, such as emails and photographs, disappear after about eight hours.
Waters, Witchel, graduate student Owen Hofmann and post-doc Christopher Rossbach, proved (unfortunately) that Vanish didn’t work. They created a program called “Unvanish” that makes vanished data recoverable after they should have disappeared.
“Our goal with Unvanish is to discourage people from relying on the privacy of a system that is not actually private,” says Witchel, assistant professor of computer science.
Waters adds, “Messages that self-destruct at a predetermined time would be very useful, especially where privacy is important, but a true self-destruction feature continues to be challenging to provide.”
Trusting the Machine
Computer scientists such as these are hard a work creating ways to make our computers and networks more secure, but computer security also comes down to people and trust.
On the one hand, good people are putting their trust into the security of machines, just as they trust a pilot to land an airplane safely. Machines are also being programmed to trust other machines in efforts to provide people with the systems and services they need.
On the other hand, there are bad people who take advantage of that trust, violating machines to serve their means.
People still fall prey to the Nigerian savings scam emails and are tricked into divulging confidential information. Typosquatters take advantage of small typos in URLs to send users to malicious Web sites. Computers can be unknowingly controlled by a virus, turned into a “bot” and used for malicious activities. Over 80 percent of all spam originates from collections of bots called botnets.
From Chang’s perspective, trust is something people engender way too much in machines. “It’s much easier for an attacker to compromise a Web site we trust and get a whole bunch of us than try to attack a lot of individuals,” he says.
The violation of trust usually involves problems with software and vulnerable Web sites, which is why Chang says it’s important to always install the most recent security patches for Web browsers and use the most secure browsers. Accessing Web sites that have a record of being less vulnerable can also help reduce compromises to your system. But the number of compromised Web pages likely numbers in the millions, and the most used and popular sites are generally heavily targeted by attackers.
So at some level, it’s everyone’s personal responsibility to protect his or her machine.
“Unfortunately the only way to really protect it right now,” says Chang with a smile, “is to turn it off, disconnect it from the Internet, encase it in cement and bury it 100 feet below the ground.”