SpiNNaker-Based Supercomputer Launches in Dresden
By
Sally Ward-Foxton 05.28.2024 0
Share Post
Share on Facebook
Share on Twitter
A new neuromorphic supercomputer is claiming the title of world’s biggest. University of Dresden spinout SpiNNcloud, formed to commercialize technology based on a second generation of Steve Furber’s SpiNNaker neuromorphic architecture, is now offering a five-billion neuron supercomputer in the cloud, as well as smaller commercial systems for on-prem use. Among the startup’s first customers are Sandia National Labs, Technische Universität München and Universität Göttingen.
The first generation of the SpiNNaker architecture, an academic project led by Arm architecture co-inventor Steve Furber, was created 10 years ago and is used in more than 60 research groups in more than 23 countries today. The second generation of SpiNNaker architecture, SpiNNaker2, is substantially different to the first, SpiNNcloud co-CEO Hector Gonzalez told EE Times.
“We don’t have a bottom-up approach, where you try to encode every single synapse of the brain into silicon,” he said. “We have an approach that is more practical. We follow inspiration from the brain where we believe it makes sense, where we see tangible effects on efficient compute.”
Gonzalez calls SpiNNaker2’s architecture a hybrid computer—combining acceleration for three different types of workloads, the intersection of which SpiNNcloud thinks will be the future of AI. These workloads are: brain-inspired spiking neural networks, practical application-inspired deep neural networks and symbolic models—which provide reliability and explainability.
Spiking neural networks (SNNs) mimic the brain’s dynamic sparsity for the ultimate energy efficiency. Deep neural networks (DNNs), which form the bulk of mainstream AI today, are excellent learners and very scalable, but less energy efficient and are sometimes criticized for being a “black box”—that is, they are not explainable. Symbolic models, formerly known as “expert systems,” have a rule-based backbone that makes them good at reasoning, but they have limited ability to generalize and adapt to other problems. In the SpiNNaker context, symbolic models provide explainability and can help make AI models more robust against phenomena like hallucination.
Future AI models will combine all three disciplines, making systems that can generalize their knowledge, be efficient and behave intelligently, per DARPA’s definition of the “third wave of AI,” Gonzalez said. SpiNNcloud is working with various groups of researchers on this. Possibilities include DNN layers for feature extraction followed by spiking layers, for example.
“This type of architecture enables things you wouldn’t do with traditional architectures because you cannot embed the event-based [properties] into the standard cascaded processors you have with traditional architectures,” he said. “So this enables entirely new fields.”
“We have the potential to deploy applications in these three fields and particularly at the intersection we have the capacity to deploy models that cannot be scaled up in in standard hardware,” he added.
Gonzalez’s example of a neuro-symbolic workload, NARS-GPT (short for non-axiomatic reasoning system), is part-DNN with a symbolic engine backbone. This combination outperformed GPT-4 in reasoning tests.
“The trouble with scaling up these models in standard architectures is that DNN accelerators often rely on tile-based approaches, but they don’t have cores with full programmability to implement rule-based engines for the symbolic part,” he said. By contrast, SpiNNaker2 can execute this model in real time.
NARS-GPT, which uses all three types of workloads SpiNNaker2 is designed for, outperformed GPT-4 in reasoning. (Source: SpiNNcloud)
Other work combining SNNs and symbolic engines includes SPAUN (semantic pointer architecture unified network) from the University of Waterloo. The connectivity required is too complex to execute in real time on GPUs, Gonzalez said.
Practical applications that exist today for this type of architecture include personalized drug discovery. Gonzalez cites work from the
University of Leipzig, which deploys many small models that talk to each other over SpiNNaker’s high speed mesh. This work is aiming to enable personalized drug discovery searches.
“Standard architectures like GPUs are overkill for this application because the models are quite small, and you wouldn’t be able to leverage the huge parallelism you have in these small constrained [compute] units in such a highly parallel manner,” he said.
Optimization problems also suit SpiNNaker’s highly parallel mesh, Gonzalez added, and there are many applications that could use an AI that does not hallucinate. Smart city infrastructure can use its very low latency, and it can also be used for quantum emulation (the second generation architecture has added true random number generation to each core for this).
In-house accelerators
The SpiNNaker2 chip has 152 cores connected in a highly parallel, low power mesh.
Each core has an off-the-shelf Arm Cortex-M microcontroller core alongside in-house designed native accelerators for neuromorphic operators, including exponentials and logarithms, a true random number generator, and a MAC array for DNN acceleration.
A lightweight network on-chip is based on a GALS (globally asynchronous, locally synchronous) architecture, meaning each of the compute units behaves asynchronously but they are locally clocked. This mesh of compute units can be run in an event-based way—activated only when something happens.
SpiNNaker2 cores, based on Arm Cortex-M cores plus additional acceleration, are connected in a mesh. Cores can be switched off when not in use to to save power. (Source: SpiNNcloud)
A custom crossbar gives the Cortex-M cores and their neighbors access to memory in each of the nodes. SpiNNcloud has designed partitioning strategies to split workloads across this mesh of cores.
The true random number generator, SpiNNcloud’s patented design, samples thermal noise from the PLLs. This is exploited to produce randomness that can be used for neuromorphic applications (e.g. stochastic synapses) and in quantum emulation.
The chip uses an adaptive body biasing (ABB) scheme called reverse body bias based on IP developed by Racyics, which allows SpiNNcloud to operate transistors as low as 0.4 V (close to sub-threshold operation) to reduce power consumption while maintaining performance.
The company also uses a patented dynamic voltage frequency scaling (DVFS) scheme at the core level to save power. Cores can be entirely switched off if not needed, inspired by the brain’s energy-proportional properties.
“Brains are very efficient because they are energy proportional—they only consume energy when it’s required,” he said. “This isn’t just about spiking networks—we can do spiking networks, but this is about taking that brain inspiration to different levels of how the system operates its resources efficiently.”
The SpiNNcloud board has 48 SpiNNaker2 chips. Ninety of these boards fit into a rack, with a full 16-rack system comprising 69,120 chips. (Source: SpiNNcloud)
SpiNNcloud’s board has 48 SpiNNaker2 chips, with 90 boards to a rack. The full Dresden system will be 16 racks (69,120 chips total) for a total of 10.5 billion neurons. Half of that, five billion neurons, has been deployed so far; it can achieve 1.5 PFLOPS (32-bit, using Arm cores), and 179 PFLOPS (8-bit, using MAC accelerators). Theoretical peak performance per chip is 5.4 TOPS, but realistic utilization would mean around 5.0 TOPS, Gonzalez said.
Chips on the board can communicate with each other in the order of a millisecond, even at large scale. The full-size system has chips connected in a toroidal mesh for the shortest possible communication paths between chips (this has been optimized based on research from the University of Manchester).
SpiNNcloud’s Dresden supercomputer is available for cloud access, while the first production run for commercial customer systems will be in the first half of 2025.