The red-hot concept of big data as a service is getting a boost from no less than three major companies this week as Cloudera Inc., DataStax Inc. and IBM Corp. all roll out new offerings.
Cloudera today will introduce Cloudera Altus, a platform as a service initially running on the Amazon Web Services Inc. cloud that the big-data vendor says can help data engineers use on-demand, elastic infrastructure to speed the creation and elastic data pipelines and quickly move them into production.
The announcement follows DataStax’s launch yesterday of the DataStax Managed Cloud, a fully managed service of the DataStax Enterprise data platform, which is based upon Apache Cassandra. Like Cloudera, DataStax is initially launching its service on the Amazon platform with plans to add Microsoft Azure and Google Cloud Platform support in the future. The announcement fulfills intentions the company stated last fall when it acquired DataScale Inc., a cloud-based managed service provider.
IBM’s offering isn’t a service but rather a toolkit that customers can use to integrate open-source databases into their private and hybrid clouds. It will be delivered with integrated servers, storage, networking and support for multiple relational and big data platforms, including MongoDB, PostgreSQL, MySQL, MariaDB, Redis, Neo4j and Apache Cassandra. Its special sauce is that the software runs on IBM’s OpenPower LC servers, which are designed specifically for big-data software and which IBM said deliver up to twice the price-performance of Intel Corp. x86-based systems.
Forrester Research Inc. has pegged database as a service as the fastest-growing database category over the next four years as suppliers seek to provide automation capabilities, lower cost and increased flexibility to scoop up new customers.
A change of mind?
Of the three, Cloudera’s announcement is perhaps the most surprising, given Cloudera co-founder Mike Olson’s statement in a SiliconANGLE interview last year that “we won’t get into providing Hadoop as a service.” Cloudera Altus is exactly that, although Charles Zedlewski, senior vice president of products at Cloudera, said he believes Olson meant that the company wouldn’t compete with infrastructure as a service providers.
With Altus, Cloudera is being careful to walk that line. Zedlewski said nearly 20 percent of customers are already running in public cloud, and the company has no intention to try to woo them away. “We thought the one thing we could do better is simplify the overall experience,” he said. “Running big data workloads in the cloud has been more complex than it needed to be. There’s too much know-how of AWS infrastructure required. That’s what motivated Altus.”
Altus is provided in a platform as a service form for fast ramp-up, and is aimed initially at data engineers. It enables them to easily and quickly provision Apache Spark, Apache Hive, Hive on Spark and MapReduce capacity on cloud-native infrastructure. Another feature enables data engineers to run direct reads from and writes to cloud object storage, data immediately available for use by other Cloudera workloads without requiring data replication, extract/transform/load procedures or changes to file formats.
“Instead of you having to install, generate a cluster and run a workload, all you do is specify a workload and we run the rest in the background,” Zedlewski said. It’s also intended to work with existing Cloudera installations on AWS. “You maintain your relationship with Amazon and we don’t get in the middle,” Zedlewski said. “The workload and data runs in your account. We don’t see your data. All the data you have on [AWS] S3 we can work with.”
In fact, Cloudera is positioning the service as more of an adjunct to existing Amazon workloads than a replacement for them. The company said it can run multiple workloads on the same unified platform but gain efficiencies by, for example, generating pipelines in one cloud cluster and running queries against them in another. Workloads can also be moved between the Altus cloud and other Cloudera instances. Over time, multiple clouds will be supported.
The company said Altus automates and simplifies common operational issues related to elastic data pipelines with workload management. This permits users to troubleshoot failed jobs with or without the clusters or compute infrastructure being present. The workload manager also flags significant performance deviations and proposes a root cause analysis. Pricing will be available both on a per-hour and a subscription basis.
Cloudera and other vendors’ hands are being forced by Amazon, which is stealing customers by offering cloud-enabled versions of their own products, said David Vellante, chief analyst at Wikibon, a sister company of SiliconANGLE. “Cloudera is an infrastructure company and so is AWS, and as AWS moves up the stack, it’s grabbing more share of the available market,” Vellante said. “Vendors like Cloudera have no choice but to play along in the cloud, trying to add value where they can and moving faster.”
DataStax courts enterprises
DataStax’s managed cloud offering is based on DataStax Enterprise, a version of Apache Cassandra with proprietary enhancements for performance, reliability and management. Like Cloudera, the DataStax service aims at simplifying the process of installing, configuring and sizing a big data workload. “Despite the high stakes, customer support data shows that operational mistakes are the number one cause of urgent issues,” such as downtime, said Robin Schumacher, chief product officer at DataStax, in an FAQ distributed to the media.
The service is based on the same underlying data layer as DataStax’s on-premises software, meaning that applications don’t have to be rewritten and that organizations can create application program interfaces and microservices architectures that run across platforms. DataStax said three-quarters of its customers are already using its technology in blended cloud environments. “Nearly half of DataStax customers are interested in a fully managed solution,” Schumacher said.
In keeping with its enterprise focus, the company is offering a high-touch approach to on-boarding, complete with a system architect who helps customers plan their installation. It’s also making uptime guarantees. Customers can submit support tickets without providing details or diagnostic logs because the DataStax organization has direct access to the cluster configuration and logs. The management console also integrates with existing automated management systems via a RESTful API.
IBM stresses openness
IBM’s offering is based upon OpenStack for quick integration with existing hybrid and private clouds, the company said. The package includes a self-service portal that enables users to quickly deploy their choice of open source community databases, a scalable, automated, and reliable open platform for on-premises, private cloud delivery, a disk image builder tool for customers who want to build and deploy their own custom databases to the database image library, an unspecified open source operations manager and a turnkey storage configuration comprised of storage servers, JBOD (just a bunch of disks) disk drawers, OpenStack control plane nodes and network switches pre-integrated with the open source database-as-service toolkit.
The use of Power-based servers is significant because it helps users avoid server sprawl, despite the fact that IBM is hosting the infrastructure, said Chuck Bryan, Power growth solutions manager at IBM. “Power architecture is uniquely designed for data-intensive workloads and has been optimized for open source database workloads,” he said. “That means half the number of servers or twice the number of MongoDB images supported on each server.”
IBM will only support community editions of the database engines in the toolkit’s first release, Bryan said. Many of the commercial implementations of those engines are available in “enterprise” versions with additional features for scalability and availability. Bryan said support for those enhance versions will be added later this year.
IBM made no mention of Bluemix, its platform as a service offering for cloud development, but Bryan said that and other PaaS environments can be supported at the customer’s option. “Since it is OpenStack-based, it can support our clients’ hybrid clouds via API integration with Bluemix services to the applications that enterprise developers build on top of open source,” he said.