Skip to content

Search the site

How Natwest's "Boxed" teamed up with Datadog to slash cloud costs, spot big spenders

"We hear you; Datadog is not free..."

Hop onto any forum about Application Performance Monitoring (APM), “observability” and logs – from a sub-Reddit to a Slack channel –and you will find someone lamenting how painfully expensive it can get; Datadog has often borne the particular brunt of this conversation over the years.

At the company’s London Summit today it tackled that subject straight on; then rolled out customers to show how much money Datadog had helped them save, including via its Cloud Cost Management proposition.

“We hear you. Datadog is not free” said product manager Natasha Goel candidly, adding that the company has worked hard to help customers “correlate performance improvements to cost savings” and eaten its own dogfood on that front, saving itself $14 million in cloud costs by doing so.

(Datadog, which now has over 30,000 customers, says it invests a hefty 30% of revenues back into R&D with 21 major product launches and 400+ new features and integrations taken live in calendar 2024.

Its 2022 acquisition of Cloudcraft, which provides real-time cloud infrastructure modeling with tasty visualisations, seems to have been particularly useful for customers and it has pushed hard into more of a FinOps space with its Cloud Cost Management "CCM" proposition.)

“What has all this cost visibility helped our teams actually accomplish? It's twofold: The first is that they can actually evaluate trade offs between cost, performance and other things when they're doing optimizations. And the second is that they can quantify the impact of their work…" said Goel.

“We have a performance win Slack channel where we share both performance improvements and also cost efficiency wins,” she added.

Datadog *saving* you money?

As a result, whether it was customer Ana Pasparan, a cloud platforms manager at Vodafone, or Antoine Dao and Mike Bryant from Natwest’s “Boxed”, FinOps was very much in focus during keynote’s presentations.

Natwest’s “Boxed” (its “Banking-as-a-Service” platform) uses AWS, GCP, lots of Kubernetes… and “we have some critical challenges around managing cloud costs effectively,” platform manager Antoine Dao said.

“This is especially true for platforms like ours, which are multi-tenant and operate across multiple cloud environments,” he said during a keynote.

It's probably not quite this simple...

Dao’s colleague Mike Bryant explained: “We've generally got a whole bunch of cloud accounts across AWS and GCP. They're not all our core platform team’s; but we have 90% of the stuff, which is 90% of the cost."

("We've also got some cloud accounts that are owned and managed by data engineering teams and security teams, which gets interesting," he added.)

“What we really aim for for most of our development teams is they work in Kubernetes. They get to deploy stuff there. They configure stuff through that; they make heavy use of custom resource definitions so that everything can be done within Kubernetes,” Bryant told the audience.

See also: Uber migrates to Arm, sees cost, performance wins. But first...

His team works with the business to set budgets for the year and then track spending by team, but it “gets interesting, because there's all these different teams running inside our one cluster, and a lot of shared resources too. We've got databases, Elasticsearch, networking stuff.

"Some cost allocations are very simple. Kubernetes, with all that shared infrastructure? Rather difficult,” he said; adding that in theory it would be nice if we had ‘one budget at the top’ and everyone respected it (which drew an appreciative laugh) but we “want to be able to look at things like, ‘how much does this business product cost? Can we get a cost to serve?”

Natwest has worked assiduously to improve cost allocation and cost visibility he said, bringing it up from ~50% of services to ~90%, with some help from Datadog’s Cloud Cost Optimiser and some homegrown nous; as well as making use of the FinOps Foundation’s “crawl, walk, run” maturity model guidance, and moving from low-hanging fruit to edge cases. 

Better visibility, better cost allocation; better savings.

“The partnership was quite serendipitous as Datadog got in touch just as we were looking for the ‘next thing’” Natwest’s Dao said: “We set up a call with Natasha and her team to discuss pain points, provide feedback on features… it was a very open collaboration, with no rigid objectives beyond simply working together to improve the cost management capabilities that we had and provide feedback on Datadog’s tools…” 

The outcome? “We leveraged key features from Datadog’s CCM to deliver a FinOps platform. The first, as I mentioned, was Kubernetes cost allocation; multicloud cost attribution, some cloud cost alerting features, and then finally, cloud cost recommendations; mostly to determine any cost savings from committed use discounts on databases and the like.”

More specifically, Bryant explained, (The Stack’s synopsis) Natwest uses Backstage, which lets teams see all the services they own and related resources (deployments, data pipelines, pull request status) plugged into Datadog for graphs and dashboards; his team deploys the Datadog agent (“the magic comes in”) to assign tags to metrics, traces, and logs emitted Kubernetes pods or containers based on labels or annotations.

Platform engineering also uses a “bunch of custom rules [and adds] service tags to some ‘untaggable things’ like support, which isn't really aligned to any given service, so we put labels on that, so we can track it.” 

That’s the mechanics of the visibility. 

The actual cost savings when you have that visibility come heavily from “committed use discounts” in AWS, and “BigQuery swap reservations” in GCP among other areas, he added. (“We did manage to get about 40% [savings] on BigQuery which was awesome, because it’s very expensive.”)

Latest