Big data have the potential to disrupt existing businesses and help create new ones. But to deliver on its promise, four technology factors need to come together: cheap storage, faster processing, smarter software, and larger and more diverse sets of data.
Some of that is already happening. Two decades ago, it took a machine the size of a refrigerator weighing 500 pounds to store a single gigabyte of data – enough for roughly 260 digital music tracks. Today, we carry gigabytes of data on our smartphones.
The price of storage devices has also fallen even as their size grew. Over the same period, the cost of storing a gigabyte of data has fallen from more than $1,000 to just five or six cents per gigabyte.
Set against that, however, is the amount of data that we now generate. Eric Schmidt, Google’s former chief executive, said in 2010 that about five exabytes of data, or the equivalent of 250,000 years of DVD-quality video, was created in the world every two days. By some estimates, next year we will create that much data every 10 minutes.
“Data has proliferated at unbelievable speed,” notes Hasso Plattner, the co-founder of SAP. At the same time he points out that the type of data being captured has also changed, from “structured data”, which can be easily analysed, to “unstructured data” such as Facebook updates and YouTube videos that are harder for computers to decipher.
This means that the speed at which data can be processed was a barrier.
Big data “is not a new market”, says Karim Faris, a partner in Google Ventures, the search company’s venture capital arm. “There are many examples of companies that had these big data warehouses, they just didn’t do much with them because it just took long [to process] and was too painful.”
One key technological breakthrough that helped speed up the analysis of data was the advent in the early 2000s of so-called massively parallel computing. Instead of handling one task at a time, computer systems could process a multitude of tasks at once.
“That’s how Google has built its search infrastructure, that’s how Facebook over time has built its services and Amazon as well,” says Scott Yara, who founded GreenPlum, one of the first software companies to tackle very large scale data sets in 2002.
Besides sheer speed, however, those grappling with big data today say smarter software is equally important. Many in the industry use Hadoop, an open-source framework that allows developers to build software that analyses big data and gives predictive answers about the future.
But even its advocates admit that Hadoop is complex and difficult to use, and “the reality is that the companies that have wanted to take advantage of it have either given up because they found it too difficult, or they have had to hire an army of engineers to write code against the very sophisticated statistical and coding skills required by Hadoop”, explains Steven Hillion, chief product officer at Alpine Data Labs, a Silicon Valley big data startup.
Others say even those who manage to write the highly sophisticated code that runs on Hadoop are only scratching the surface of what big data can do.
“We’re getting into very advanced statistical processing and what people call machine-learning, where the algorithms get smarter with more data in this cyclical model,” says GreenPlum’s Mr Yara. “This machine-oriented analytic processing is very, very powerful.”
What makes the latest big data applications particularly powerful is that they are being run against much broader and larger data sets. Some companies are now adding data gleaned from customers using their smartphone and tablet applications, for example.
Others are trawling through the huge volumes of social media traffic generated every day to look for consumer trends. “Retailers are attempting to create ‘graphs’ of social networks . . . to create social buying patterns,” say Mark Beyer and Doug Laney, research vice-presidents at Gartner.
Increasingly, companies are buying other types of data, such as weather data, traffic information or website statistics, in the hope that they can build a more comprehensive picture of their customers.
“The real business opportunity is found in the ability to put more data together and let the data sources refute or reinforce each other,” say the Gartner analysts. “In this way, big data makes organisations smarter and more productive: by enabling people to harness diverse data types previously unavailable, and enabling them to discover previously unseen opportunities.”