Cloud database platform provider MongoDB has taken its data streaming service GA – saying it is now ready for production workloads of mission-critical applications and is capable of pre-processing time-series data continuously before pushing it to storage in its Atlas DBAaS.
“We plan to expand beyond Kafka and Atlas databases in the coming months. Let us know which sources and sinks you need, and we will factor that into our planning,” the company said today (May 2, 2024.)
MongoDB sensibly added that it has made a development storage tier as well as a production tier available as a “cost-effective option for exploratory use cases and low-traffic stream processing workloads.”
Customers can, for example. set up a stream processing instance that subscribes to events generated by MongoDB, filters the relevant information, transforms the events, and sends them to a corresponding Kafka topic. "Additionally, it will subscribe to the Kafka cluster to update the documents that change."
The company rolled out consumer and media intelligence platform Meltwater as an example. The customer analysed one billion pieces of content each day it said: "MongoDB Atlas Stream Processing enables us to process, validate, and transform data before sending it to our messaging architecture in AWS powering event-driven updates throughout our platform,” said Meltwater software engineer Cody Perry, adding in a canned statement that this has “increased our productivity, improved developer experience, and reduced infrastructure cost.”
Whether it is sensor data from industrial facilities or financial services transaction data, streams rather than batches of data are increasingly sought after but quite hard to wrench on for those not used to it.
As Goldman Sachs’ CIO recently told The Stack “The world is moving from batch, which was the cornerstone of our industry, to a continuous stream where you have to have the data organised in a way that is very agile, very cost effective – which is a big deal – and you have compute layered on top of that it can act on any data… The three big directions [of change] are on the one side, AI and on the other side, the data evolution towards real-time; then you have the computing infrastructure going from servers to serverless. Those three are really converging,” he added.
Announcing the beta release of Atlas streaming in 2023, MongoDB suggested that its database was well position to help developers wrangling with “variable, high volume, and high-velocity data; the contextual overhead of learning new tools, languages, and APIs; and the additional operational maintenance and fragmentation that can be introduced through point technologies into complex application stacks.”
Documentation is here. Some other open and closed-source database providers, of course, also offer data streaming support/platform capabilities, depending on your preferred flavour of DB and associated infrastructure, e.g. DataStax’s Apache Pulsar-based Luna Streaming.
MongoDB says it now has 47,800 customers in 100+ countries.