AWS has made DeepSeek’s full-fat R1 model available via its Bedrock managed service – following its move earlier last week to serve the Chinese AI lab’s smaller distilled Llama and Qwen models to customers.
“We've always believed that no single model is right for every use case, and customers can expect all kinds of new options to emerge in the future,” – AWS CEO Matt Garman
Google Cloud has, with less fanfare, quietly made it available to experiment with on its Vertex managed service. Microsoft has made it available via its AI Foundry proposition and also on GitHub.
Handwringing in some quarters about "sending data to chain" is, of course, moot if the API calls are going to US cloud regions and the model is running on hyperscaler infrastructure with very large security teams.
AWS, for example, noted that users can gain confidence from running it in architecture “uniquely designed for security” – but said users must integrate DeepSeek-R1 model with controls from Amazon Bedrock Guardrails.
“The DeepSeek-R1 model in Amazon Bedrock Marketplace can only be used with Bedrock’s ApplyGuardrail API to evaluate user inputs and model responses for custom and third-party FMs available outside of Amazon Bedrock”, AWS’s Channy Yun said in a blog on January 30.
DeepSeek-R1 requires at least 800 GB of HBM memory in FP8 format for inference and in a machine learning blog AWS demonstrated it using a ml.p5e.48xlarge instance, which features eight Nvidia H200 GPUs.
R1 on AWS can be deployed through its SageMaker JumpStart offering, Bedrock Marketplace, or directly from Hugging Face model cards.
“Amazon Bedrock is best for teams seeking to quickly integrate pre-trained foundation models through APIs. Amazon SageMaker AI is ideal for organizations that want advanced customization, training, and deployment” – AWS
The CTO of one large financial services multinational told The Stack that they had already set their team to exploring R1’s capabilities on Bedrock, as technology leaders scramble to balance an appetite for innovation with concern at deploying a model from a country with a sustained track record of aggressive cybersecurity activity against Western targets.
(Of R1’s release, another CTO of a $20 billion revenue consumer company told The Stack this week: “I’m also not shocked to see more cost-effective approaches emerge —what they did is pretty well understood in research. That being said, what they did is HARD. Going past CUDA to really get that level of optimization is not a small accomplishment…”)
Hugging Face posted: “Hugging Face Inference Endpoints offers an easy and secure way to deploy Machine Learning models on dedicated compute for use in production on AWS… to create AI applications without managing infrastructure: simplifying the deployment process to a few clicks…handling large volumes of requests with autoscaling, reducing infrastructure costs with scale-to-zero, and offering advanced security.”
Via its “Inference Endpoints” users can deploy any of six distilled R1 and also a quantized version of DeepSeek R1 made by Unsloth it said.
“On the model page, click on Deploy, then on HF Inference Endpoints. You will be redirected to the Inference Endpoint page, where we selected for you an optimized inference container, and the recommended hardware to run the model. Once you created your endpoint, you can send your queries to DeepSeek R1 for 8.3$ per hour with AWS 🤯.” – Hugging Face
Microsoft, meanwhile, also published its guidance on "Using DeepSeek models in Microsoft Semantic Kernel" alongside further documentation on using "DeepSeek-R1 on Azure” with a LangChain4j Demo.