Cazena: Managed Big-Data-as-a-Service Made Easy
If you don't know where to start with big-data-as-a-service, upstart player Cazena says it has a solution worth looking at. Cazena markets managed data mart and data lake services -- with a twist. The "managed" part is straightforward enough: Cazena automates the deployment, integration, and configuration -- along with many aspects of the management -- of data mart and data lake services.
Unlike competitors such as Amazon, Microsoft, and Google -- to name just a few -- Cazena doesn't market its own data warehouse, data mart, and data lake services.
Think of Cazena as an automation technology for big-data-as-a-service, says Hannah Smalltree, director of product marketing. It aims to simplify everything from deciding where to cost-effectively run workloads to configuring, deploying, and scaling parallel clusters for maximum performance.
"The idea [with big-data-as-a-service] is similar to a utility company. Regardless of how big a company you are, rather than building your own generator, you just plug into the wall and get your power. We have that for data services -- but could we do something like that for data processing?"
Turnkey Convenience for Big-Data-as-a-Service?
The problem, Smalltree argues, is that no big-data-as-a-service provider offers utility-like convenience, manageability, and availability. In the words of Gartner analyst Adam Rothnal, there's always "some assembly required." Cazena aims to minimize this assembly.
It also helps simplify the process of spinning up use-case-specific services in the cloud. The promise of cloud is that it has the potential to give businesspeople the power to access and use the services they want. There's no IT in the middle to intervene, restrict, control, or say no.
This is true of many software-as-a-service and even many platform-as-a-service offerings, but Smalltree contends it isn't, or wasn't, true of big-data-as-a-service. For the analyst or data scientist, there has been no such thing as a turnkey big-data-as-a-service offering or data lake service.
Typically, an enterprise doesn't spin up a data lake in the cloud. An enterprise first spins up a Hadoop or Spark cluster in the cloud. From there, the fun begins. Cazena's model enables subscribers to focus on workloads -- use cases -- instead of the underpinning technology.
"You say what sort of workload you want to run -- for example, do you want a data mart or a data lake service -- and then we provision and configure the software you need. You're not beginning with 'I need Hadoop in the cloud,'" Smalltree says. The Cazena service asks questions about your use case -- data mart, data warehouse, or data lake -- and about the characteristics of the workloads you plan to run.
Insulation Against Cloud Service Lock-In
Cazena, she says, provides the equivalent of best-of-breed big-data-as-a-service -- in a single-tenant architecture model.
It also works on more than one cloud service, so it provides some degree of insulation against cloud lock-in, says Lovan Chetty, director of product management with Cazena.
"Lock-in in the cloud is a lot more significant than locking yourself into Dell or HP or a hardware vendor. There's definitely talk [among customers] about preventing lock-in on a single cloud provider," he says. "We've had a few examples of people who've come up with a scenario ... where the cloud provider has gone down in a particular region and ... basically said, 'Just move your service to another region.' If you have a whole bunch of networking infrastructure that's been built for a particular region, that doesn't work."
No Silver Bullet, But Nothing Is
Cazena isn't a silver bullet. If you want to use it for traditional decision support workloads -- e.g., reporting, dashboards, OLAP -- you're still going to have to manage the data integration (ETL, data cleansing, data quality) and business intelligence bits (data model, presentation layer, business rules) yourself. This issue is hardly specific to Cazena. It's true of Redshift. It's true of Microsoft's Azure SQL Data Warehouse. It's true of IBM's, Oracle's, and Teradata's cloud offerings and Snowflake Computing's managed data warehouse service.
If you want to expose a decision support-like service to non-technical users, you have to do the technical heavy lifting for them. Cazena is popular among analysts and data scientists, Chetty says. These types of consumer are more comfortable acquiring, preparing, and integrating data.
Metadata management in the context of Hadoop and other NoSQL platforms is much less well understood than it is in the data warehousing world. At this point, Cazena doesn't do anything to address this issue, although no one else does, either. For example, Cazena doesn't offer a unified metadata repository; metadata lives in the source repositories, instead. Customers must either purchase third-party technologies or roll their own solutions, Chetty says.
Cazena has the potential to significantly simplify the work of analysts and data scientists. It doesn't and can't automate the process of building out the equivalent of data warehouse architecture in the cloud -- but it can and does accelerate it. From simplifying the tasks of spec-ing and sizing a cloud data mart or data lake environment to optimizing that environment for workload-specific requirements, Cazena targets several of today's most acute pain points in the nascent big-data-as-a-service space.