The Market for Hadoop Products and Services Is Becoming Less Hadoop-y

Gartner has done the hard work of surveying the lay of the Hadoop landscape so you don't have to.

The market watcher's new Market Guide for Hadoop Distributions assesses a Hadoop market that's in the midst of a major transition. First, Hadoop adoption is shifting from monolithic, on-premises deployments to ad hoc or on-demand cloud instances.

At the same time, the available Hadoop products and services are paradoxically becoming less Hadoop-centric as vendors "disaggregate" their offerings by integrating newer fit-for-purpose compute engines (such as Spark) and supporting Amazon S3 and other alternatives to Hadoop's Distributed File System (HDFS).

It's critical for data and analytics leaders to understand this changing market.

More Competition than Ever, Especially in the Cloud

One driver for this shift is increased competition, particularly in the cloud. For example, Gartner last year christened Amazon and its Amazon Web Services (AWS) as the dominant Hadoop vendor while Hadoop pure plays Cloudera, Hortonworks, and MapR Technologies all grew their market share as well.

"In 2016, end-user inquiries regarding Hadoop and Microsoft Azure are up 57 [percent] year over year, while inquiries about Hadoop and AWS are up 171 [percent] over 2015," the new report states. "Traditional vendors such as Cloudera and Hortonworks are reacting by pricing their offerings on a consumption basis in cloud environments and creating packages that focus on targeted subsets of the projects in their full distributions."

That isn't all. Data management powerhouses Microsoft and Oracle put more weight behind their Hadoop offerings in 2016, and Google introduced Google Compute Engine, a competitor to AWS' Elastic MapReduce (EMR) service. Ironically, competition for Hadoop-related revenues increased even as the Hadoop market itself was "disaggregating," according to Gartner.

Disaggregation Is Officially a Thing

The value-add that made the Hadoop platform new and different -- its all-in-one combination of cheap, scalable, distributed storage with cheap, scalable, general-purpose parallelism -- has ceased to be a critical differentiator. "Spark, [although] included in every Hadoop distribution, showed increasing adoption in scenarios that did not include other Hadoop elements. Cloudera announced support for Impala on Amazon S3, extending from its Hadoop Distributed File System ... roots.

"Hortonworks launched its Hortonworks Data Cloud on AWS, offering various disaggregated components [such as Hive, Spark, and Zeppelin] capable of accessing data on S3," the report said.

Production Deployments Lagging

Disaggregation is one reason data and analytics leaders need to be especially wary in selecting Hadoop products and services, Gartner says. Another reason for caution is that Hadoop-based projects are still largely stuck on pilot. You read that right: pilot, not autopilot.

"Despite the variety of vendors, deployment environments, and geographic expansion, it is still challenging to get Hadoop-based projects beyond the pilot phase," the report says. "A recent [2016] Gartner survey ... shows that only 14 [percent] of respondents have deployed Hadoop. This is up from 10 [percent] in February of 2015. Roughly unchanged are the percentages of organizations with no plans for Hadoop, at 52 [percent] and 54 [percent], respectively."

Part of this has to do with the immaturity of some Hadoop projects or services. On the one hand, Hadoop remains a poor performer for many decision support use cases -- e.g., interactive query processing workloads that also have high concurrency requirements. On the other hand, even though Hadoop is positioned as a playground for cutting-edge use cases -- such as streaming analytics or in-memory machine learning -- the available projects are relatively immature. This could pose problems for some adopters, according to Gartner.

"The downside for less aggressive organizations is that new use cases ... often require the use of immature, unsupported software. Thus ... while investment in big data continues, the move to production has remained flat as mainstream adopters deal with these constant changes," the report says. "Data and analytics leaders must weigh packages offered as 'platforms' consisting of multiple, frequently changing components against 'solutions' with a clear, targeted outcome in mind."

Looking Further Ahead

The Gartner report explores several other provocative issues, including the likelihood, if any, of consolidation in the Hadoop space. (Short answer: don't expect major consolidation any time soon.) The report says the major Hadoop vendors will likely become even less Hadoop-centric as they diversify their offerings to address "broader data management and analytics use cases."

It also urges buyers to think cloud first in 2017 and beyond. "Cloud deployment options reduce the amount of time needed to set up and tear down experiments and proofs of concept," the report indicates.

"Production workloads may also be more cost-effective and provide more agility if they do not have to run on a constant ... basis[. In that case,] costs for storage and compute can scale independently."

Subscribe to Upside

Q&A with Jill Dyché

Find out what's keeping teams up at night and get great advice on how to face common problems when it comes to analytic and data programs. From head-scratchers about analytics and data management to organizational issues and culture, we are talking about it all with Q&A with Jill Dyche.

View Recent Article

Submit Your Questions to Jill

Powered by TDWI. Advancing All Things Data
A Division of 1105 Media, Inc.