Big data is proving a fertile ground for the growth of new business models as companies find ways to combine the vast wealth of available information.
Kabbage, an online lender to small businesses; BeachMint, a Los Angeles-based fashion ecommerce retailer; DataSift, the inventor of the original retweet button on Twitter, the microblogging website; and The Climate Corporation, which provides weather insurance to farmers, all agree that without big data they would not exist. But none are fans of the term.
Big data does not just mean a lot of information. It also refers to so-called unstructured data – sensor data, social media outpourings, video and images – that does not fit neatly into the rows and columns of most databases.
Rob Frohwein, chief executive of Atlanta-based Kabbage, is an intellectual-property lawyer who has been carrying around an ideas journal since childhood. “What if one of the large online marketplaces bought a credit company? What would they do with it? They’d give cash to the businesses that were generating their online revenue. That was the germ of Kabbage,” he says.
From application to cash in the bank for small, mostly online, companies takes seven minutes. Kabbage monitors its borrowers by linking to and watching their private data sources: bank accounts, Twitter feeds, eBay and Facebook accounts, among others. Interest rates are between 2-18 per cent. The company’s 90,000 accounts are mostly in the US, but it is planning a UK launch in February.
The company’s very name, slang for cash, illustrates the difference between it and a normal bank, says Mr Frohwein. “It has never been easy for a mainstream bank to do small business investment,” he says. “We can do it partly because we are entirely online, and don’t use traditional public data sources.
“We are a data-context company. We look at the trends in the data, the space between data points.”
He says that, for him, big data means three things: “Data that might have been generated for another purpose; real-time, ongoing data; and the ability to draw insights across disparate data sources.”
But he stresses that it is “the people in our data-science team, coming from a variety of backgrounds” that make the difference, “not just tools such as Hadoop”.
Hadoop is an open-source version of a parallel programming framework called MapReduce, originally developed at Google, the search engine company. It simplifies data processing across huge data sets that are distributed across different hardware, even though it requires expensive programmers to access and analyse the data.
In Los Angeles, Doug Cohen, director of analytics at BeachMint, heads a small team of big data developers and analysts, who sit alongside their marketing colleagues. Established in 2010, the boutique fashion retailer bills itself as a “next-generation social commerce” company.
Though a recommendation engine is at the core of what the company does, the human factor is important, says Mr Cohen. Each of the company’s lines, from JewelMint through ShoeMint to IntiMint, is curated by celebrities and stylists, including actors Kate Bosworth and Justin Timberlake, and designer Estee Stanley and Brooke Burke-Charvet, a US television presenter.
“The fashion world changes so rapidly that it would be difficult to fully automate [customer recommendations],” says Mr Cohen. Customers take a style quiz on registration, and pay a monthly subscription.
On the data analytics side, the company uses Pentaho Business Analytics, MySQL and HP Vertica to process terabytes of data and millions of daily emails, and record clickstream data. “If Pentaho is down for five minutes we will have a flood of emails [from company employees],” says Mr Cohen. “There is a constant pulling of reports; everyone uses the data.”
His business could not exist without social networking companies such as Facebook. “In the web 1.0 world, this would have been much more difficult. It would have been harder to establish that friendly boutique-store relationship online back then.”
Social media giants Facebook and Twitter are more or less synonymous with big data, not only at the level of generating it, but also for inventing some of its characteristic technologies.
Nick Halstead is the founder and chief technology officer of DataSift, based in San Francisco in the US and Reading in the UK. It aims to help organisations improve their understanding and use of social media.
The company took off as an offshoot of TweetMeme, the Twitter newsfeed service, and has a heritage in RSS news aggregation. It helps companies mine social media data, such as tweets, Facebook posts, and content on blogs and forums. It is a user of Cloudera’s distribution of Hadoop, and is deeply embedded in the pure big data world.
But according to Mr Halstead, the true value of big data lies in highlighting the term “data scientist”. This is getting students interested in mathematics. He stresses that data science “is different to programming. It’s more creative than that – it is a mix of arts and maths.” And from a business customer point of view, it is the joining of data sets by big data technologies, such as MapReduce, that is most valuable.
David Friedberg, founder and chief executive of The Climate Corporation, also stresses the business and social value of joining data from a variety of sources – data that already exists but has been underexploited. His company aggregates weather data from the US government, from satellites and sensors, as well as manually collected field data, and uses it to insure farmers against bad weather.
The company plans to expand internationally, but will stay focused on agriculture. “We looked at ski resorts and travel companies, but, for them, the weather is not the primary driving factor. Every farmer is an individual operator, bearing the risk totally exposed.” He believes his company is making farmers’ lives better. And it can do so globally by using data that is now cheaper to collect, store and analyse than ever before.
Brian McKenna is business applications editor at Computer Weekly
Get alerts on Facebook Inc when a new story is published