Do-Gooders and Data
Author: Brian Pascal
Silicon Valley’s extraordinary economic success within the technology industry was built in large part by relying upon the power of numbers — especially the power of enormous piles of numbers collected from impossibly large groups of people. These days, there’s a faith bordering on the religious among the tech-minded that numbers can solve all of our problems.
This approach has crystallized into the polyglot field generally referred to as big data. Among other things, big data is the attempt to employ technology to extract deep and useful truths from very large databases. This approach is applied across fields, from companies like Google and Facebook selling advertising, to utilities companies distributing electricity, to police departments attempting to systematize their law enforcement efforts.
The broad appeal of this approach is not only within the commercial and private sectors. It is no surprise that the big data style of problem solving, or however one chooses to define the phrase, also appeals to NGO-types, activists, international aid workers, and others tackling some of the world’s biggest human rights challenges. These are complicated problems as old as humanity, and the promise of a 21st-century technical solution that cuts through some of the uncertainty is obviously attractive.
Unfortunately, despite all of the rhetoric that has sprung up around the big data promise over the last few years, computer science and databases are not flawless or magical. Like any other technological approach to complicated problems, big data comes with its share of pitfalls. What’s more, in addition to its engineering challenges, big data also presents a psychological hurdle. By its nature, this technology is concerned far more with correlation than causation, but humans as a species are somewhat bad at separating one from the other. Our neurology is designed to find patterns, even when none exist. Put simply, this is a use of technology that cultivates blind trust, often through observation and confirmation bias. But sometimes that trust is misplaced, and, without a great deal of training and analysis, it can be very difficult to tell just how far we should trust the results it produces.
For example, over the past few years Google has published its flu trends, predicting the incidence of the flu based upon trends that they culled from their users’ search queries. At first glance, this seems like a perfectly rational approach – when people are sick they often Google their symptoms. And while WebMD might accidentally convince a few people that they have contracted a rare form of cancer, on the whole it seems like analyzing search queries might give a decent first-order approximation of how the flu is spreading across the country.
In practice, this isn’t quite right.
It turns out that Google’s flu trends, while they sometimes correlate with actual epidemiology, also often miss major outbreaks, or predict outbreaks where none exist. This too makes sense. If you hear about a particular flu outbreak on the news, you might start googling symptoms out of mere curiosity. Similarly, while it may seem like utilizing the gyroscope and GPS functions of cell phones to automatically report potholes is a great idea, in reality this disproportionately reports road issues in affluent areas where more individuals can afford smartphones. While it may be possible to predict the existence of these second-order effects, it’s often impossible to determine at the front end just how big an influence they will have. This means that without substantial follow-up analysis, it can be very difficult to know just how much to trust the results that big data analysis can produce.
Returning to NGOs, activists groups, and others who are just beginning to implement some form of big data technology – it is important to understand that alongside any engineering requirements, they must also understand the limits of those technologies. This involves combining all of the expertise that the organization has cultivated through its experiences with at least some understanding of the technical foundations of the data that they are attempting to analyze. Unfortunately that essential combined expertise is rarely present. Statistically speaking, computer scientists, statisticians, and others who work with data for a living do not make up a large percentage of the employees of most NGOs (and are often missing completely from their rosters). This experience gap means that NGOs often need to rely on the firm supplying the big data technology, even though that firm might not have any experience at all with the substantive issues. This creates a dangerous gap that could cause outbreaks or needs to go unrecognized, anticipated or be marginalized based on flawed data analysis.
The solution, insofar as one exists, begins with recognizing the existence of this mismatch and accepting that it is far from trivial to combine the on-the-ground expertise of the grassroots NGO with the abstract, technical tools provided by Silicon Valley firms. At the same time, recognizing a problem is the first step in solving it. The potential for big data is both real and significant – we just all need to be more aware and cautious of the potential pitfalls and challenges of applying this approach to the messy reality of our world. While it’s extremely unlikely that the “there’s an app for that” approach of Silicon Valley will ever, say, solve the problem of genocide, applied properly it might be able to make a dent.
– – –
Brian Pascal is a research fellow with the Privacy and Technology Project at the UC Hastings College of the Law and a non-resident fellow with the Center for Internet and Society at Stanford Law School. Brian’s work encompasses a variety of disciplines at the intersection of privacy, security, and technology, including civil liberties, data protection, big data, digital ethics, and how to adapt the law to account for innovation. His research currently focuses on how developing technologies have shifted the balance of power among individuals, companies, and governments. Brian has served in a variety of roles at the interface of technology, law, policy, and business including as a civil liberties engineer with Palantir Technologies, a cybersecurity and privacy consultant with IBM, and an attorney with the firm of Wilson Sonsini Goodrich & Rosati.