Six years, thousands of bug fixes, many, many commits and a host of welcome new features later, Apache Cassandra 4.0 is finally generally available (GA) -- and in production at Apple, Netflix, Orange, Sky UK, Yelp, and many others, with a claimed ~5X improvement on data streaming speed during scaling operations, enterprise-grade auditing, live query logging, Java 11 support and more. (The RC release came April 27.)
The open source, distributed NoSQL database runs some of the most demanding workloads out there and the rigorous approach used to get a stable Apache Cassandra 4.0 out the door lays the groundwork nicely for future community efforts -- and indeed sets a fine example for other OSS projects to boot. (The community has agreed to one release every year, plus periodic trunk snapshots. Every incoming release will be supported for three years.
See also: DataStax Astra just took Cassandra Serverless. That’s a game-changer. Next up, open-sourcing the innovation...
"Software projects have already had the user rule to never use x.0 until bugs that were missed are found in production and an update is released," a community blog noted. "For an open source project driven by community, this seemed like something we can avoid and set a new standard.
"To get the quality required, we took a completely different approach to verify data correctness in Cassandra. The scale that Cassandra clusters can reach means that there is an enormous surface area for potential bugs or data corruption, so we purpose-built new tools to cover every requirement.
- Property-based / fuzz testing
- Replay testing
- Upgrade / diff testing
- Performance testing
- Fault injection
- Unit/dtest coverage expansion
Over the past six years, those tools were perfected and deployed to help meet our quality goals. This sets an important baseline for any future version of Cassandra and provides the needed infrastructure to ensure future releases maintain a high level of quality and correctness."
As DataStax VP and Cassandra expert Patrick McFadin put it: "This is a really big day for the project and for everyone that's contributed over the past 11 years... Cassandra isn't just a database project. This is a community of users solving problems that are becoming less rare every day. Facebook's scale problems 10 years ago is almost every company's problem today. Tolerance for failure is almost zero."
Among the additions landing with Apache Cassandra 4.0 are features to support live query logging. FQL (Full Query Logging) is safe for production use, Cassandra’s contributors say, with configurable limits to heap memory and disk space to prevent out-of-memory errors; a feature designed to support live traffic capture, as well as traffic replay; it can also be used for both debugging query traffic and migration.
(New nodetool
options are also added to enable, disable or reset FQL, as well as a new tool to read and replay the binary logs. The full query logging (FQL) capability uses Chronicle-Queue to rotate a log of queries.)
Cassandra 4.0 makes several improvements to streaming: i.e. how Cassandra cluster nodes exchange data in the form of SSTables — the immutable data files that Cassandra uses for persisting data on disk.
Apache Cassandra 4.0 can be downloaded here.