Former Intel CEO and semiconductor heavyweight Pat Gelsinger has invested in British AI hardware company Fractile, he confirmed today.
Gelsinger’s backing is a major coup for Fractile – which was founded by Walter Goodwin in 2022 but only emerged from “stealth mode” in 2024.
(Goodwin graduated with a doctorate in AI and robotics from the University of Oxford; Fractile last year secured $15m in seed funding from NATO’s innovation fund Kindred Capital and Oxford Science Enterprises.)
What is Fractile trying to do?
Fractile aims to make compute and memory happen in the same hardware component; rather than pushing memory out into DRAM elsewhere on a system or, as high-end NVIDIA hardware does, physically bonding separate high-bandwidth memory (HBM) from the likes of Micron to the GPUs.
The Stack could not immediately glean the precise "compute-in-memory" approach it aims to take (we have contacted Fractile for comment), but there is no shortage of interesting recent research out in the wild on taking alternative approaches to “von Neumann’s” architectures (the classical separation of processing and memory, to store data and instructions.)
"Anything but boring"!
“[Five] years ago, the world of computing was a [sic] boring”, Gelsinger said: “Add some cores, speed up IO, increment memory performance… Then, the AI/LLM explosion and the world is anything but boring today."
However, Inference of frontier AI models is bottlenecked by hardware. Even before test-time compute scaling, cost and latency were huge challenges for large scale LLM deployments. With the advent of reasoning models, which require memory-bound generation of thousands of output tokens, the limitations of existing hardware roadmaps has compounded. To achieve our aspirations for AI, we will need radically faster, cheaper and much lower power inference" – Pat Gelsinger
Memory’s a big bottleneck
In particular caching the math that an LLM has done to turn your request into tokens, then a vector of numbers, multiplying that through hundreds of billions of model weights, etc. places huge demands on memory hardware; OpenAI engineers earlier flagged this as a major bottleneck.
“[Fractile’s] in-memory compute approach to inference acceleration jointly tackles the two bottlenecks to scaling inference, overcoming both the memory bottleneck that holds back today’s GPUs, while decimating power consumption, the single biggest physical constraint we face over the next decade in scaling up data center capacity,” Gelsinger said.
Certainly as of six months ago Fractile had only tested its designs in simulations and not had test chips fabbed. Its simulations have convinced it – and now one of the most experienced semiconductor leaders in the world in Gelsinger – that it can create hardware that could run inference workloads “100 times faster and 10 times cheaper than Nividia’s GPUs" as an article by Fortune on Fractile put it last year.
Watch this space.