DataStax Chairman & CEO Chet Kapoor is unapologetic: he wants to build a $1 billion business, and he wants to do it two ways: “We want to partner with enterprises: become the secret weapon of CIOs for activating their real-time data. And we want to make it really easy for developers to build scalable, future-proof applications.”
Santa Clara-headquartered DataStax has already helped both multinationals and startups build and scale modern applications on the open-source, NoSQL Apache Cassandra database – which is relied on by Uber, Spotify, and Netflix among others for its high availability, scalability, and self-healing capabilities.
The company also enjoys a reputation for having a team of the smartest architects, developers, and engineers in the market. Under DataStax CEO Chet Kapoor’s stewardship over the past two years, it has dramatically expanded its ability to help customers build open, cloud native technology stacks for all real-time data applications.
Central to that strategy has been DataStax’s Astra DB cloud service and its work open-sourcing a range of tools that aim to marry data workloads with Kubernetes (software for managing containerised applications). The vision: helping customers “quickly build real-time data applications that run anywhere - on any cloud or Kubernetes, anywhere in the world, while driving down TCO, and driving up the velocity with which enterprises can release new applications”. Goodbye cloud vendor lock-in, hello ease of deployment.
The Home Depot makes a critical pivot
Speaking to The Stack, Kapoor points to how central real-time data applications and the infrastructure underpinning them now are to so many businesses – including those that have a physical and digital presence. The Home Depot is one example.
As the pandemic struck, the world’s largest home improvements chain, with over 2,300 stores across the U.S., pivoted quickly from a mostly bricks-and-mortar approach to one in which (as The Home Depot’s Senior VP of Information Technology Fahim Siddiqui put it) it started treating its mobile app and the internet as its “front door to stores.”
That included working with DataStax to quickly deploy an application for “curbside pick-up” amid a country-wide lockdown and scaling it from scratch for nationwide deployment in a few short weeks.
The achievement can’t be taken lightly.
Developers helped make the major pivot a reality and dramatically boost its profitability: The Home Depot carried out over 1.5 billion customer transactions in 2020, growing revenues 20% to $132 billion with Chairman and CEO Craig Menear pointing out that sales through digital platforms had soared 86%: “Acceleration of growth in our interconnected and digital offerings gave us the opportunity to showcase, in a very condensed time frame, new capabilities and ways for customers to engage with The Home Depot.”
The company had a big advantage: it had started on its digitalisation journey after 2018 and was already using DataStax Enterprise – a powerful, scale-out data infrastructure underpinned by a hybrid cloud NoSQL database built atop open-source Apache Cassandra. DataStax Enterprise does big data clusters – very big. It is also designed for high availability and performance on-premises, across hybrid clouds, and from bare metal to Kubernetes provisioning. Home Depot isn’t the first or only retailer to offer store delivery pickup, but it might be the first US retailer to scale it almost from scratch for millions of customers on such a short timescale.
“We are extremely proud to have partnered with The Home Depot to roll out curbside pickup to all stores in the U.S. in less than 30 days, all using Kubernetes and DataStax,” DataStax CEO Chet Kapoor says.
This is what its disciples had promised a decade ago when the cloud suddenly became a mainstream service – rapid provisioning, huge scale, the ability to do things quickly that would have taken months or years before its invention – and now they’ve been vindicated. While the pandemic was shredding business plans, DataStax was there to give this retailer a way out when they found themselves in a tight corner. This might come to be seen as the moment when DevOps and infrastructure finally meshed – and meshed fast.
Kubernetes + Data
Kubernetes is a big part of this journey.
And one of Kapoor’s focuses has been on making DataStax more than just a database specialist and transforming it into a one-stop-shop for the managed services and software you need to build modern, scalable real-time data applications that are easy to use, cost-effective to run (anywhere), and which anticipate your success; i.e. the need to scale both computational resources but also developer resources. That means prioritising making it simpler to build on with modern APIs, underpinned by Kubernetes and with serverless data on-demand.
As lead DevOps engineer at dating application Hornet, Nate Mitchell, recently told The Stack: “I’m a big fan of Kubernetes. At the moment, if I’m building a whole bunch of custom orchestration stuff, with monkeys pulling levers on ECS, when the next person comes in, I’ve got to teach them how to do that. If I do everything on Kubernetes when the next person comes in, I just need to say ‘hey, do you know Kubernetes’?”
As Mitchell pointed out: “That really simplifies the longevity of a project.”
See also: Apache Cassandra -- good for gay clusters and rarely on fire, but sometimes everything needs a little TLC.
There are deeper reasons why DataStax has pursued the development and open-sourcing of tools that marry Kubernetes and Cassandra (most notably, they introduced the open source K8ssandra, which is Cassandra on Kubernetes). Both are scalable horizontally or vertically and are based on nodes that let developers expand or contract infrastructure with no downtime or third-party software.
They both support high levels of scalability – developers can build and run distributed applications that automatically scale, based on demand, avoiding the need to pay for idle resources.
And critically, both are self-healing. Kubernetes instantly redeploys failed elements of containerised applications, while Cassandra has built-in replication that lets users easily recover any failed nodes without data loss. (Spotify uses Cassandra to easily replicate the data between its EU and US data centers, allowing Spotify’s music personalisation system to reach its users even if any single data centre suffered a failure.)
As Kapoor notes: “Customers are going through a digital transformation journey. The first wave was mobile, the second one was the cloud and the third one is data. The problem is many organisations simply are not poised for the last leg of this digital transformation, data.” Getting their sprawling data empires into shape represents a massive challenge requiring businesses to wrestle with multiple platforms and apps not to mention the investment made in legacy systems. Nevertheless, they need to move fast to stay in business building what Kapoor calls “experiences” – or applications that connect a business process to a customer.
“Many have built a beautiful app that leverages 30% of their data so why not make it 80%?” he muses.
“Because databases by themselves just have data. You have to make the data useful and meet developers where they are: you have to make it easy for them. In the past, Cassandra was hard to manage and hard to build for: that’s the first thing we have set out to change; removing barriers to entry,” he emphasises.
(This newfound ease of use is due in part to DataStax’s 2020 release of Stargate: an open source API gateway for interfaces such as REST, GraphQL, gRPC, and schema-less JSON that abstracts Cassandra-specific complexities away from application developers.)
Pay-as-you-go serverless
Kapoor’s bet was that by making hard things easier, customers would want to do more, driving further growth. To that end, the company’s most significant Cassandra development of all was the appearance in 2020 of DataStax’s Kubernetes-architected Astra DB Database-as-a-Service (DBaaS).
In 2021, crucially, this became serverless to simplify operations and deliver dramatic cost savings with pay-as-you-go pricing. And with the recent addition of multi-region capabilities, companies like Barracuda Networks are simplifying their global operations. Astra DB is free to launch, with on-demand tier-based pricing based on usage and reserved capacity for larger workloads. Pick a cloud to run it on: Astra DB can be deployed on AWS, Azure, or Google Cloud. By separating server compute and storage in microservices style, Astra DB lets users scale data up and scale down dynamically, as needed.
(Most “serverless” products refer to compute, not data, and many companies only scale up their databases once or at best twice a year based on their economic calculations; a deeply inefficient approach. Astra DB lets users scale based on traffic and app requirements, scale down to zero automatically when the database is not in use, and deploy apps that can scale infinitely from day one. Research shows that the TCO savings to be three-to-five fold using Astra DB compared to provisioning for peak capacity for the same workload on-premises.)
“How do we become the secret weapon for CIOs while making it really easy for developers to build apps? Let’s just give them the APIs and hide the complexity. Let it run as a service; let it be serverless,” says Kapoor.
A unified real-time data stack
Other directions designed to fulfill Kapoor’s vision of delivering a “unified stack for all real-time data'' include Astra Streaming, a data streaming service based on Apache Pulsar that was built following the acquisition of cloud messaging company Kesque. Astra Streaming, together with Astra DB, delivers a single platform for managing all real-time data: both data “at rest” and streaming data “in motion.”
The introduction of Astra Streaming, together with the serverless Astra DB and the open source Stargate and K8ssandra, represent an extraordinary rate of innovation and diversification for a company that three years ago was seen as overwhelmingly a Cassandra shop. Yet Kapoor says his focus is on what customers are doing on the extraordinary fairground ride that enterprise software and cloud development has become.
What in the end will make the difference for DataStax is how well customers adapt to the deeper organisational challenges that lie ahead, he notes. “The old architectures don’t work,” says Kapoor. “They were built for a certain way of doing things. Now you need to create an environment where data scientists, app developers, and business people start forming alliances independently of who they report to, to serve a specific mission.”
For Kapoor, a central theme of this is simply creating infrastructure, tools, and cloud services that make development much easier than people expect: “Even for companies like Netflix and Apple, skills are a problem across the board. There are not enough developers. That’s why we’ve spent a lot of time making DataStax easy to operate and easy to build on. Ultimately businesses need to standardise and have a data-first approach.
“And we at DataStax are here to help people innovate very quickly on top.”
“This is not a Big Bang transformation for most organisations. It’s about solving individual problems. But increasingly, those problems are – like for The Home Depot – ones absolutely critical to business success.”