Whether you’re a utility pulling performance data from the sensor-laden turbines of an offshore windfarm, an ecommerce provider responding to a spike in orders, or a bank seeking intelligence on sudden surge in withdrawals, real-time intelligence from streamed data has never been more important. Yet often the behind-the-scenes plumbing that makes this possible has not evolved with a world moving to containerised workflows running across public cloud, private cloud, or any hybrid/multicloud combination thereof.
End-users, as a result, often find themselves caught in platforms not built for the cloud; stuck relying on single tenant systems, and wedded to monolithic architectures that couple compute with storage. The outcome is far too frequent infrastructure configuration; manual intervention and firefighting, and a heavy maintenance workload as engineers wrangle with multiple messaging technologies, processing tools, and associated infrastructure designed to load streaming data into data stores. It can get expensive too...
One to Watch: Homomorphic Encryption specialist Enveil
Among the software platforms designed to help support more flexible and dynamic high-performance data pipelines and the mission-critical applications depending on them is Apache Pulsar – a cloud-native, unified messaging and data streaming platform first developed at Yahoo! and open sourced in 2018. Pulsar became a top-level Apache Software Foundation (ASF) project in September 2019, but is still something of an exotic unknown outside of a small but growing collection of the world’s more forward-thinking engineering teams -- while enterprise support and/or managed services for the young project have also been hard to find.
Now – following in the footsteps of Apache project founders like Databricks that have gone on to launch immensely successful businesses – Apache Pulsar and its associated storage project Apache Bookeeper’s founders have teamed up to launch a turnkey offering for managed Apache Pulsar. Their startup is called StreamNative, it just raised $23 million in a Series A round led by Prosperity7 Ventures, and The Stack is making it sixth “one to watch”: a series focussed on exciting startups that we think could become a critical part of the enterprise IT stack. We sat down with StreamNative Founder Sijie Guo and Chief Architect Addison Higham and to learn more about the company’s team, plans, and customer use cases as it looks to take Pulsar mainstream.
Sijie, talk us through the genesis of StreamNative...
Sijie -- Sure. StreamNative was founded by the original creators of Pulsar. We have been kind of working on this technology since its inception. [CTO] Matteo Merli and I worked at Yahoo! about 10 years ago, where the company had been running many different messaging technologies; different teams and departments were managing multiple messaging technologies, and the cost of operating them had increased a lot -- with each team building different operations teams and creating a lot of data silos which then had to be brought to one central location; managing the entire data lifecycle had created a lot of challenges. Ultimately we decided to create one centralised cloud messaging service -- that was the birth of Pulsar. In the past three years we've seen widespread adoption across different industries; so we decided to set up a commercial company bringing it to more enterprises seeking to manage the entire data lifecycle. We're now global team of around 50 people.
What's your go-to-market approach?
Sijie -- A lot of open source businesses get started with an open core model. They build an open source project and community around that, and grow a huge user base around an open source project -- then build additional enterprise features like security-related features, a great UI to simplify managing and monitoring the software and are maybe additional performance-related features as closed source. That model has been shifting quite a lot from 'Open Core 2.0' to 'Open Core 3.0' -- or more of a SaaS model where you have popular open source software, but people are shifting their entire IT infrastructure from on-premise to cloud, so they want to delegate the operation and services to a service provider on public, or private, or hybrid cloud.
One to Watch: Carbon Accounting Startup Persefoni
Sijie -- Basically, we're building a Pulsar-as-a-Service through what we call 'StreamNative Cloud'. We have two models: one is a typical SaaS model with a cloud-hosted cluster. i.e you get a cluster that is entirely hosted in a StreamNative Cloud account and the enterprise justs get access to the cluster via a public endpoint protected by VPC. That is our first model. The second model is a managed one where StreamNative will be responsible for running a Pulsar cluster within customers' cloud accounts: that could be a public or even a private cloud environment in which people have Kubernetes deployed. Tthis model is very useful for larger enterprises in special industries like financial services who want to keep their data in a particular environment. We've seen customers asking for both hosted and managed. Two or three months ago we also announced our self-hosted model.
Addison, what use cases are you seeing?
Addison -- Going back to first principles, Pulsar is a system for making it easy to deal with real-time data. More and more things are real-time now: as consumers we've all got used to services like Uber and meal deliveries. Meanwhile the move to Kubernetes is pretty prolific and people are really rebuilding systems that can can run very well there. So from an architectural perspective Pulsar is very easy to run in the cloud and particularly on Kubernetes. We've seen Iterable [a cross-channel marketing automation platform] use it for that mission-critical last mile of helping to deliver emails, push notifications, SMS messages to customers, for example. [Ed: Iterable replaced RabbitMQ, Kafka and Amazon SQS with Pulsar. You can see a detailed write-up on why, with architectural illustrations here. Ecommerce platform Narvar meanwhile has a write-up detailing its own shift to Pulsar here.] We're also helping customers with IoT -- users that have lots of different channels of data coming from lots of different devices and need more flexibility in how that data is segmented: for example it might be very 'bursty' data that you need to be able to process over and Pulsar does very well in terms of scalability; its ability to really leverage the cloud makes it a lot simpler, operationally speaking, to handle variants in workloads.
What are your current feature priorities?
Addison - We want to really bring capabilities of Pulsar to organisations quicker. That means making it fullly self-service to deploy a cluster into a customer's account with the entire suite of Pulsar function connectors, and rock solid for organisations to to leverage and use with a lot of simplicity. We're also working on new features like integrations with Lakehouse from Databricks; Delta Lake capabilities that allow you to keep your data accessible to Pulsar as a stream, but then also to use that same underlying source of data for traditional batch and analytical workloads. That's a very unique capability that we're bringing into Pulsar and StreamNative Cloud that we're very excited about.
Who do you talk to commercially, typically?
Addison - Lots of places. Sometimes it's people struggling with other streaming systems, like Kafka, operationally; they want something easier, that takes care of higher scalability. Sometimes its from the operational or DevOps side. Sometimes its developers or architects saying 'we have really heavy needs around messaging, but we also do a lot of streaming things; it would be snice if we could have one technology that can span across a broader section of use cases!' We're also seeing higher level engineering groups approach us who spent a tonne of money on Confluent Cloud but need to move towards multicloud; who are asking 'how do we make it simpler to span multiple cloud providers?'
That's something where our use of Kubernetes -- whether for a self-managed model or StreamNative offerings -- can span multiple cloud providers and provide geo replication across those different cloud providers."
Engineers can dip into a slightly dated but nonetheless usefully detailed thread about Pulsar deployments on HackerNews here. Perhaps needless to say, the project has a vibrant community and has evolved fairly significantly since that conversation however -- the latest version landed August 11, 2021. Documentation and project details can be found on GitHub here (with the repo showing 9,600+ stars and over 400 contributors...) you can learn more about StreamNative here.