Critical NVIDIA bug: An ‘old school’ risk to AI workloads

NVIDIA has patched a critical vulnerability in its widely used Container Toolkit. The bug, allocated CVE-2024-0132, lets an attacker ultimately take over a host system with full root privileges. It was reported by Wiz.

The NVIDIA Container Toolkit bug, CVE-2024-0132, has a critical CVSS 9.0 rating. It affects all versions up to v1.16.1. A fix comes in v1.16.2, which was released on September 25 by NVIDIA; 26 days after it was first reported to NVIDIA’s security team (which acknowledged it in under 72 hours.)

The cloud security specialist said the bug highlights the extent to which “‘old-school’ infrastructure vulnerabilities in the ever-growing AI tech stack remain the immediate risk that security teams should prioritize...”

(i.e. That's over more nebulous fears around AI hacking capabilities; OpenAI's latest "most thoughtful" model can pass 0% of collegiate level Capture the Flag exercises, although it got a little creative trying one...)

How is CVE-2024-0132 exploited?

Wiz has a detailed blog here. To cut to the chase however: Exploitation starts with a malicious container image that an attacker would need to craft and somehow run or get someone to run on the target platform.

They could do this “either directly (for example in services allowing shared GPU resources) or indirectly through a supply chain or social engineering attack (e.g., a user running an AI image from an untrusted source).”

Because so many AI service providers "run AI models and training procedures as containers in shared compute environments, where multiple applications from different customers share the same GPU device" the opportunity for an attacker to hop customer boundaries is ripe.

Wiz added: “With this access, the attacker can now reach the Container Runtime Unix sockets (docker.sock/containerd.sock).

“These sockets can be used to execute arbitrary commands on the host system with root privileges, effectively taking control of the machine (this is a known attack path for containerized systems, see here).”

The company is not yet sharing more exploit details to avoid the use of CVE-2024-0132 by "bad actors". It emphasised however that "containers are not a strong security barrier and should not be relied upon as the sole means of isolation.

"When we design applications, especially multi-tenant applications, we should always 'assume a vulnerability' and design to have at least one strong isolation barrier such as virtualization (as explained in the PEACH framework)."