More on Big Data’s Traps
April 9 2014If you want to read about the limits and pitfalls of big data, you’ve had a growing number of options in recent weeks. One of the best is “Big data: are we making a big mistake?” by Tim Hartford in the Financial Times. It’s worth a read in full, and adds to the big data lessons we offered last month.
Additional lessons include:
- Big data is often just “found data.” Unlike large datasets captured in scientific laboratories by physicists, companies’ data comes from our web searches, financial data, social network activity, etc. These sources often provide an incomplete and “messy collage of datapoints.”
- Big data is vulnerable to sampling errors, favoring or excluding certain groups. Lots of data doesn’t mean representative data. This can create challenges, even for well-meaning data projects. For example, Boston created a smartphone app called “Street Bump,” which uses residents’ smartphone accelerometers to detect potholes and automatically report them to the city. But standing alone, the app is likely to drive the attention of city services to younger, affluent areas where more people own smartphones.
- Impressive predictions often come with quiet false positives. Target’s data-driven prediction of a woman’s pregnancy has become legendary. But we don’t know how many times the retailer guessed wrong. “There’s a huge false positive issue,” said Kaiser Fung, who has developed similar programs for other stores.
The bottom line, Hartford argues, is that big data analyses — like traditional statistical analyses — must be handled with care: “Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster and cheaper these days — but we must not pretend that the traps have all been made safe. They have not.”