Skip to content

Search the site

OpenAIAIcybersecurityLLMsNews

OpenAI’s unripe “Strawberry” model hacked its testing infrastructure

"Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration"

The Great AI Model Arms Race continued this week as OpenAI previewed a new model, dubbed Strawberry, that uses reinforcement learning and chain-of-thought reasoning to generate more analytical responses.

“For complex reasoning tasks this is a significant advancement and represents a new level of AI capability" boasted OpenAI. "Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.”

(It is previewing two models: o1-preview and o1-mini.)

OpenAI Strawberry: Not enterprise-ready

Strawberry is unripe at this stage when it comes to enterprise features.

OpenAI said: “The API for these models currently doesn't include function calling, streaming, support for system messages, and other features…”

Regardless, the likes of LlamaIndex and LangChain were swift to offer early Python and Typescript packages for those with early access and looking to connect custom data sources to the OpenAI o1 models; an established ecosystem will clearly fall into place around them at GA.

See also: Meta’s new “CRAG” benchmark exposes depth of RAG challenge

Developers who qualify for API usage Tier 5 can “start prototyping with both models in the API today with a rate limit of 20 RPM. We’re working to increase these limits after additional testing,” said OpenAI in a blog.

Notably, testing by the “Model Evaluation and Threat Research” (METR) using its autonomy task suite performance benchmarks found that the performance it “observed with o1-mini and o1-preview was not above that of the best existing public model (Claude 3.5 Sonnet)” however…

OpenAI Strawberry: “Productionising” is harder… 

The models' use of reinforcement learning brings challenges.

As NVIDIA’s Jim Fan mused that “productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes?”

“[But] we are finally seeing the paradigm of inference-time scaling popularized and deployed in production” he added in a social post.

Frustratingly for some, OpenAI said "we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer."

Capture the Flag, like a teenager

OpenAI bakes “mitigation” guardrails into its models to prevent abuse which can neuter some results on tests against things like vulnerability exploitation.  Its 43-page “model card” shows that when it comes to cybersecurity, for example, “neither o1-preview nor o1-mini sufficiently advance real-world vulnerability exploitation capabilities” based on tests against Capture the Flag (CTF) competitive hacking challenges. 

“Given 12 attempts at each task, o1-preview (post-mitigation) completes 26.7% of high-school level, 0% of collegiate level, and 2.5% of professional level CTF challenges; o1-mini (post-mitigation) completes 28.7% of high-school level, 0% of collegiate level, and 3.9% of professional level CTF challenges” – OpenAI 

The document suggests that it was not much more dangerously effective, pre-guardrails: “The final post-mitigation model approximately matches the o1-preview pre-mitigation model” the system card reveals. 

OpenAI Strawberry goes rogue?

OpenAI Strawberry did go a little rogue in one security test though… 

“This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. 

“Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration” said OpenAI.

(“Note”, its researchers added crisply, “that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network.”)

They added on September 12: “After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API.”

Fun, but we’re not quite in ex_machina territory yet.

See also: Hash, crack, and the data scientist: Trio of Python frameworks exposed

OpenAI's system card is here [pdf] and blog here.

Latest