Uber bites the cloud bullet after “reimagining” its infrastructure: Goodbye 100k+ servers
Uber is ditching its data centres and migrating completely to the cloud as part of the company’s ongoing efforts to streamline operations; a significant move for the company, which has long rolled its own infrastructure.
Uber runs 95% of its IT from its own data centres and deploys hundreds of thousands of servers crunching huge amounts of compute to match dynamic demand with dynamic supply; both of which evolve in real-time.
The ride-hailing company this week revealed two seven-year deals with Oracle Cloud Infrastructure and Google as part of that shift, but has not disclosed how it will split its primary workloads across cloud services.
The decision follows Uber’s annual earnings, reported on February, which showed that revenue for 2022 grew 83% year-on-year to hit $31.8 billion and it processed bookings of over $115 billion with the company turning a rare profit of $595 million, as CEO Dara Khosrowshahi dubbed it the company’s “strongest quarter ever”.
Uber cloud migration: Oracle Cloud Infrastructure, Google win big
The contracts come after a multi-year project – dubbed “Crane” internally – that was undertaken to “reimagine our infrastructure stack for a hybrid, multi-cloud world”; with its software engineers admitting that “host provisioning (particularly for on-prem hardware) had become a “complicated and error-prone process.”
Kamran Zargahi, Uber’s senior director of technology strategy, said this week that the contracts will see Uber migrate completely off its own data centres over the next few years. (He told the WSJ that a tipping point on the decision came during the pandemic, when supply-chain disruptions pushed IT hardware delivery timelines to more than 12 months; cloud would help reduce this reliance on the hardware supply chain, he said.)
The company has already been using a range of different cloud services.
As its engineers have put it in a Medium blog on “Crane” published in September 2022: “We built an abstraction layer over various cloud providers’ APIs and the process for provisioning a new host is simply a matter of calling into this abstraction layer as well as inserting a record for the host in our host catalogue…”
See also: How Airbnb tackled its burgeoning AWS costs
“The story is much more complicated for our on-prem hardware. For a physical server, we first insert a record into our host catalog based on information from our physical asset tracker. Then, utilizing an in-house DHCP server that can respond to NetBoot (PXE) requests and some software to help us ensure that a host will always first attempt to boot from the network, we image the machine… reclaiming a host is also much more complicated on-prem” a multi-author blog noted, adding that Uber IT teams had traditionally had “wide latitude to customize the operating systems on subsets of hosts to their individual needs. This led to fragmentation, making it complex and difficult to perform fleetwide operations like OS, kernel, and security updates, porting the infrastructure to new environments, and to troubleshoot production issues.”
Its team have since pushed most workloads to containers and enforced that every host is identical at the OS layer, “containing only the essentials: a container runtime and general identity and observability services.”
To give the scale some context, for storage alone the company runs over 1,000,000 storage containers on close to 75,000 hosts with more than 2.5 million CPU cores. (Docstore, Schemaless, M3, MySQL, Cassandra, Elasticsearch, etcd, Clickhouse, and Grail are all containerised, say staff software engineers at Uber.)
Uber cloud migration: Oracle says its involvement as non-traditional
Oracle Cloud Infrastructure (OCI)’s team were particularly pleased to have won what they described as a highly competitive tender. When asked by (a somewhat suspicious editor at The Stack wondering whether Oracle’s participation was an ERP cloud migration with a shiny dress on) what percentage of the Uber cloud workloads would be Oracle software that had just been moved from on-premises to cloud, Mark Hura, EVP, Oracle North America Cloud Infrastructure & Technology told us: “Zero! This is not traditional Oracle workloads.
“It’s their core infrastructure that’s operating their business today. This is not a monumental win to migrate EBS to OCI and run it more efficiently for our customers; we do that every day” he emphasised. “This is completely separate from that… [Uber will be leveraging OCI] compute, network infrastructure, database, storage…”
“Uber is expanding into a ‘go anywhere, get anything’ platform, and the company needed a cloud partner that shares a relentless focus on innovation,” said Oracle CEO, Safra Catz in a release. “This landmark competitive win for OCI is further validation of the momentum and acceleration we are experiencing in the market.”
As part of the deal Oracle will also become a global Uber for Business client, selecting Uber as a preferred rideshare for its employees to travel and eat around the world. (Uber now has contracts with over 170,000 organisations worldwide, including 60% of the Fortune 500, executives noted at its last investor day.)
At Uber’s last investor day execs highlighted the extent to which technology changes moved the needle for it financially, saying “recent improvements we have made in courier pricing, which are powered by machine learning models, actually helped us lower fulfillment cost by more than a $100 million in the last year.”