NVIDIA suffered the largest share loss of any company in history on Monday amid market turbulence generated by the DeepSeek R1 AI model release – with traders fearing cheap-to-train, cheap-to-run LLMs could pose an existential threat to colossal CapEx investments in the west.
Even as markets were digesting how the model’s release impacts the Weltanschauung of GPU buyers, DeepSeek’s team quietly released yet another landmark model – Janus-Pro, a multimodal (i.e. it can generate images) model that apparently beats OpenAI's DALL-E 3 and Stable Diffusion across the industry GenEval and DPG-Bench benchmarks.
The seven billion parameter model is an upgrade on its earlier, deeply flawed one billion parameter Janus model that it admitted had been trained with “real-world data that lacks quality and contains significant noise.” It said in a January 27 paper that it incorporated “approximately 72 million samples of synthetic aesthetic data, bringing the ratio of real to synthetic data to 1:1 during the unified pretraining stage” of Janus-Pro.
(That’s just the latest indication that approached carefully, synthetic data can be powerful at improving model performance; something noted by Microsoft as well after it released its Phi-4 model in December 2024. Unusually, synthetic data constituted the bulk of the pre- training data for Phi-4, Microsoft earlier explained in an arXiv paper on its initial release. This was “generated using a diverse array of techniques, including multi-agent prompting, self-revision workflows, and instruction reversal.”)
AI fireworks are going off: How do you know which rocket to ride?
NVIDIA also finally commented, describing DeepSeek’s R1 model as an “excellent AI advancement” that complies with US technology export controls, according to an emailed statement that added “DeepSeek’s work illustrates how new models can be created… leveraging widely available models and compute that is fully export control compliant.”
Former Intel CEO Pat Gelsinger added: “Open wins. It has been disappointing to watch the foundational model research become more and more closed over the last few years. In this, I’m more aligned with Elon than Sam – we… need AI research to increase its openness.
“We need to know what the training datasets are, study the algorithms and introspect on correctness, ethics and implications. Having seen the power of Linux, Gcc, USB, Wifi and numerous other examples has made this clear to all students of computing history,” he wrote, adding on his social media pages that “DeepSeek is an incredible piece of engineering that will usher in greater adoption of AI. It will help reset the industry in its view of Open innovation. It took a highly constrained team from China to remind us all of these fundamental lessons of computing history.”
It seems highly plausible at this point that a US counterpart, perhaps Meta, may be pushed to fully open-source under a comparable MIT licence its equivalent latest models. Mark Zuckerberg said on October 30, 2024 that Meta was throwing a huge amount of firepower at Llama 4, but he still believed open-source was the best approach: “We're training the Llama 4 models on a cluster that is bigger than 100k H100s or bigger than anything that I've seen reported for what others are doing," he said on an earnings call.
"I expect that the smaller Llama 4 models will be ready first, and they’ll be ready, we expect sometime early next year, and I think that they're going to be a big deal on several fronts – new modalities, capabilities, stronger reasoning, and much faster. It seems pretty clear to me that open source will be the most cost-effective, customizable, trustworthy, performant, and easiest-to-use option that is available to developers, and I'm proud that Llama is leading the way on this…” Is it now? Watch this space.