Big Data is a commodity not unlike crude oil: fresh out of the ground it only has potential value and it needs a lot of work to make it marketable. But that potential is enormous.
Last September International Data Corporation (IDC) forecasted that this market “will grow at a 26.4% compound annual growth rate (CAGR) to $41.5 billion through to 2018”.
Again, according to IDC, this is 6 times larger than growth in comparable tech markets. As a result, over the next three years to 2018, many industries will experience a revolution, becoming “big data driven”.
And the technologies that only a few years ago promised to be the agents of this revolution are now subject to a colder, more critical evaluation because they’re expected to deliver results in a live market.
One technology that was freighted with high expectations is Hadoop, and it is currently undergoing something of a re-evaluation.
Hadoop is an open source platform that was developed in the mid 2000s and was named by Doug Cutting (one of its creators along with Mike Cafarella) after his son’s toy elephant.
It uses Google’s Map Reduce and a whole lot more besides; Hadoop breaks up huge data-sets, distributes them across multiple server platforms and analyses these data in parallel, collating the results at the end of the process.
It was designed for companies that are pulling in heterogeneous data from diverse sources like social media and increasingly machine generated data from sensor and video devices. These data flows will be measured in Petabytes (1015 bytes) in the coming years.
Increasingly, Hadoop proselytizers have to engineer workarounds to overcome its under-performance. One of these Hadoop shortfalls is in the new always on/always available internet economy; the problem is a lot of the time it doesn’t scale as well as it should.
David Gleason, Chief Data Officer for Bank of New York Mellon, was reported by The Wall Street Journal (subscription needed) as saying that Hadoop, “…wasn’t ready for prime time.”
The bank was pleased with its performance with a limited number of users but after it was accessed by an increasing number of users it slowed. Hadoop also slowed when it had to process data coming in in real-time. This incoming real-time data traffic could, for example, be location data coming in from mobile devices, which offers a sales opportunity for local businesses but the mobile ads promoting them need to be pushed out as the data comes in.
However, one company that is well placed to overcome this under-performance is MapR’s SQL-on-Hadoop offering: in the Gigaom Research’s January 2015 report published last month, it performed outstandingly against the other leading Hadoop vendors products from Cloudera and Hortonworks. MapR’s Apache Drill technology makes it easier for analysts to query unstructured data in “native format” as it’s coming into the system, that is, in real-time.
Other problems include getting Hadoop to integrate with legacy infrastructure and the fact that its development language, MapReduce, developed by Google, has very little in common with SQL which is used by most data management teams.
British Gas had a job of work integrating Hadoop with its SAP systems. Some companies even had to employ additional coders to “glue the platforms together to get the whole thing to work”.
But the commercial value trapped deep within the corporate veins of big data make people like Gleason optimistic about Hadoop; he believes it’s still the most exciting technology to come along since RDBMS.
The turbulence in Hadoop’s perception in the industry is clear from the graphic below showing the rise in demand for Hadoop professionals: there’s strong growth from 2007 until 2013 but at that point it experiences a wobble.
Although, demand does on the whole continue to grow, there’s still a clear drop in the rate of growth after 2013 when the industry begins to realise that the technology comes with a whole lot of hard work.
And additionally, compared to demand for other database technologies like MongoDB and mobile technologies like Android and iOS, Hadoop, according the Indeed Job Trends, is bumping around at the bottom of the top 10. Maybe a little disappointing for Hadoop investors and contributors but it is still in the top 10.
“Hadoop is going into the trough of experimentation. The maturity is high enough now that customers are less prone to get confused by vendor claims and focusing on the really vital requirements. It could be viewed as a negative trend, but it’s good where you are seeing early majority adoption of new technology.”
MapR, CEO John Schroeder. talking to ZDNet. December 22 2014.
As the growth in demand for Hadoop people shows, these problems to date have dented the technology’s popularity, but if its problems grow this could produce a turning point in its uptake in the near future.
Right now Hadoop is a work in progress. This is a technology operating at the bottom of the stack, there are more applications that need to be developed in the layers above to realise the promise of Big Data and while the demand for Hadoop people is still relatively high, the expectations from it will be even higher.