The chips of tomorrow may well take inspiration from the architecture of our brains
As artificial intelligence demands more and more energy from the computers it runs on, scientists at IBM Research are taking inspiration from the world’s most efficient computer: the human brain.
As artificial intelligence demands more and more energy from the computers it runs on, scientists at IBM Research are taking inspiration from the world’s most efficient computer: the human brain.
Neuromorphic computing is an approach to hardware design and algorithms that seeks to mimic the brain. The concept doesn’t describe an exact replica, a robotic brain full of synthetic neurons and artificial gray matter. Rather, experts working in this area are designing all layers of a computing system to mirror the efficiency of the brain. Compared to conventional computers, the human brain barely uses any power and can effectively solve tasks even when faced with ambiguous or poorly defined data and inputs. IBM Research scientists are using this evolutionary marvel as inspiration for the next generation of hardware and software that can handle the epic amounts of data required by today’s computing tasks — especially artificial intelligence.
In some cases, these efforts are still deep in research and development, and for now they mostly exist in the lab. But in one case, prototype performance numbers suggest that a brain-inspired computer processor will soon be ready for market.
What is neuromorphic computing?
To break it down into its etymology, the term “neuromorphic” literally means “characteristic of the shape of the brain or neurons.” But whether this is the appropriate term for the field or a given processor may depend on whom you ask. It could mean circuitry that attempts to recreate the behavior of synapses and neurons in the human brain, or it could mean computing that takes conceptual inspiration from how the brain processes and stores information.
If it sounds like the field of neuromorphic — or brain-inspired — computing is somewhat undecided, that’s only because researchers have taken such vastly different approaches to building computer systems that mimic the brain. Scientists at IBM Research and beyond have been working for years to develop these machines, and the field has not yet landed on the quintessential neuromorphic architecture.
One familiar approach to brain-inspired computing involves creating very simple, abstract models of biological neurons and synapses. These are essentially static, nonlinear functions that use scalar multiplication. In this case, information propagates as floating-point numbers. When it’s scaled up, the result is deep learning. At a simplistic level, deep learning is brain inspired — all these mathematical neurons add up to something that mimics certain brain functions.
“In the last decade or so, this has become so successful that the vast majority of people doing anything related to brain-inspired computing are essentially doing something related to this,” says IBM Research scientist Abu Sebastian. Mimicking neurons with math can be done in additional brain-inspired ways, he says, by incorporating neuronal or synaptic dynamics, or by communicating with spikes of activity, instead of floating-point numbers.
An analog approach, on the other hand, uses advanced materials that can store a continuum of conductance values between 0 and 1, and perform multiple levels of processing — multiplying using Ohm’s law and accumulating partial sums using Kirchhoff’s current summation.
How on-chip memory eliminates a classic bottleneck
A common trait among brain-inspired computing architecture approaches is on-chip memory, also called in-memory computing. It's a fundamental shift in chip structure compared to conventional microprocessors.
The brain is divided into regions and circuits, which co-locate memory formation and learning — in effect, data processing and storage. Classical computers are not set up this way. With a conventional processor, your memory sits apart from the processor where computing happens, and information is ferried back and forth between the two on circuits. But in a neuromorphic architecture that includes on-chip memory, memory is closely intertwined with processing on a fine level — just like in the brain.
This architecture is a chief feature in IBM’s in-memory computing chip designs, whether analog or digital.
The rationale for putting computing and memory side by side is that machine learning tasks are computing-intensive, but the tasks themselves are not necessarily complex. In other words, there’s a high volume of simple calculations called matrix multiplication. The limiting factor isn’t that the processor is too slow, but that moving data back and forth between memory and computing takes too long and uses too much energy, especially when dealing with heavy workloads and AI based applications. This kink is called the von Neumann bottleneck, named for the von Neumann architecture that has been employed in nearly every chip design since the dawn of the microchip era. With in-memory computing, there are huge energy and latency savings to be found by cutting this shuffle out of data-heavy processes like AI training and inferencing.
In the case of AI inference, synaptic weights are stored in memory. These weights dictate the strength of connections between nodes, and in the case of a neural network, they’re values applied to the matrix multiplication operations being run through them. If synaptic weights are stored apart from where processing takes place and must be shuttled back and forth, the energy you spend per operation will always plateau at a certain point, meaning that more energy eventually stops leading to better performance. Sebastian and his colleagues who developed one of IBM’s brain-inspired chips, named
Hermes, believe they must break down the barrier created by moving synaptic weights. The goal is much more performant AI accelerators with smaller footprints.
“In-memory computing minimizes or reduces to zero the physical separation between memory and compute,” says IBM Research scientist Valeria Bragaglia, who is part of the Neuromorphic Devices and System group.
In the case of
IBM’s NorthPole chip, the computing structure is built around the memory. But rather than locating the memory and computing in exactly the same space, as is done in analog computing, NorthPole intertwines them so that they may be more specifically called “near-memory.” But the effect is essentially the same.
The analog Hermes chip uses phase-change memory (PCM) devices that store AI model weights in the conductance values of a type of glass that can be switched between amorphous and crystalline phases.
How brain-inspired chips mimic neurons and synapses
Carver Mead, an electrical engineering researcher at California Institute of Technology, had a huge influence on the field of neuromorphic computing back in the 1990s, when he and his colleagues realized that it was possible to create an analog device that, at a phenomenological level,
resembles the firing of neurons.
Decades later, this is essentially what chips like Hermes and IBM’s other
prototype analog AI chip are doing: Analog units both perform calculations and store synaptic weights, much like neurons in the brain do. Both analog chips contain millions of nanoscale phase-change memory (PCM) devices, a sort of analog computing version of brain cells.
The PCM devices are assigned their weights by flowing an electrical current through them, changing the physical state of a piece of chalcogenide glass. When more voltage passes through it, this glass is rearranged from a crystalline to an amorphous solid. This makes it less conductive, changing the value of matrix multiplication operations when they are run through it. After an AI model is trained in software, all synaptic weights are stored in these PCM devices, just like memories are in biological synapses.
“Synapses store information, but they also help compute,” says IBM Research scientist Ghazi Sarwat Syed, who works on designing the materials and device architectures used in PCM. “For certain computations, such as deep neural network inference, co-locating compute and memory in PCM not only overcomes the von Neumann bottleneck, but these devices also store intermediate values beyond just the ones and zeros of typical transistors.” The aim is to create devices that compute with greater precision, can be densely packed onto a chip, and can be programmed with ultra-low currents and power.
“Furthermore, we’re trying to give these devices more flavor,” he says. “Biological synapses store information in a nonvolatile way for a long time, but they also have changes that are short-lived.” So, his team is working on ways to make changes in the analog memory that better emulate biological synapses. Once you have that, you can craft new algorithms that solve problems that digital computers have difficulty doing.
One shortcoming of these analog devices, Bragaglia notes, is that they are currently limited to inferencing. “There are no devices that can be used for training because the accuracy of moving the weights isn’t there yet,” she says. The weights can be cemented into PCM cells once an AI model has been trained on digital architecture, but changing the weights directly through training isn’t yet precise enough. Plus, PCM devices are not durable enough to have their conductance changed a trillion and more times, like would happen during training, according to Syed.
IBM Research's unnamed prototype analog chip uses PCM to encode up to 35 million model weights in a single chip.
Multiple teams at IBM Research are working to address the issues created by non-ideal material properties and insufficient computational fidelity. One such approach involves new algorithms that work around the errors created during model weight updates in PCM. They’re still in development, but
early results suggest that it will soon be possible to perform model training on analog devices.
Bragaglia is involved in a materials science approach to this problem: a different kind of memory device called resistive random-access memory or RRAM. RRAM functions by similar principles as PCM, storing the values of synaptic weights in a physical device. An atomic filament sits between two electrodes, inside an insulator. During AI training, the input voltage changes the oxidation of the filament, which alters its resistance in a very fine manner — and this resistance is read as a weight during inferencing. These cells are arranged on a chip in crossbar arrays, creating a network of synaptic weights. So far, this structure has shown promise for analog chips that can perform computation while remaining flexible to updates. This was made possible only after years of material and algorithm co-optimization by several teams of researchers at IBM.
Beyond the way memories are stored, the way data flows in some neuromorphic computer chips can be fundamentally different from the way it does in conventional ones. In a typical synchronous circuit — most computer processors — streams of data are clock-based, with a continuous oscillating electrical current that synchronizes the actions of the circuit. There can be different structures and multiple layers of clocks, including a clock multiplier that enables a microprocessor to run at a different rate than the rest of the circuit. But on a basic level, things are happening even when no data is being processed.
Instead of this, biology uses event-driven spikes, says Syed. “Our nerve cells are communicating sparsely, which is why we’re so efficient,” he adds. In other words, the brain only works when it must, so by adopting this asynchronous data processing stream, an artificial emulation can save significant amounts of energy.
All three of the brain-inspired chips at IBM Research were designed with a standard clocked process, though.
NorthPole is a brain-inspired research prototype chip that stored model weights digitally, but like the analog chips, it eliminates the von Neumann bottleneck that usually separates memory and compute.
In one of these cases, IBM Research staff say they’re making significant headway into edge and data center applications. “We want to learn from the brain,” says IBM Fellow Dharmendra Modha, “but we want to learn from the brain in a mathematical fashion while optimizing for silicon.” His lab, which developed NorthPole, doesn’t mimic the phenomena of neurons and synapses via transistor physics, but digitally captures their approximate mathematics. NorthPole is axiomatically designed and incorporates brain-inspired low precision; a distributed, modular, core array with massive compute parallelism within and among cores; memory near compute; and networks-on-chip. NorthPole has also moved from TrueNorth’s spiking neurons and asynchronous design to a synchronous design.
For
TrueNorth, an experimental processor that was an early springboard for the more sophisticated and commercially ready NorthPole, Modha and his team realized that event-driven spikes use silicon-based transistors inefficiently. Neurons in the brain fire at about 10 hertz (10 times a second), whereas today’s transistors run in gigahertz — the transistors in IBM’s Z 16 run at 5 GHz, and transistors in a MacBook’s 6-core Intel Core i7 run at 2.6 GHz. If the synapses in the human brain operated at the same rate as a laptop, “our brain would explode,” says Syed. In neuromorphic computer chips such as Hermes — or brain-inspired ones like NorthPole — the goal is to combine the bio-inspiration of how data is processed with the high-bandwidth operation required by AI applications.
Because of their choice to move away from neuron-like spiking and other features that mimic the physics of the brain, Modha says his group leans more towards the term ‘brain-inspired’ computing than ‘neuromorphic.’ He envisions that NorthPole has lots of room for growth, because they can tweak the architecture in purely mathematical and application-centered ways to achieve more gains while also exploiting silicon scaling and lessons gleaned from user feedback. And the data show that their strategy worked: In
new results from Modha’s team, NorthPole performed inference on a 3-billion-parameter model 46.9 times faster than the next most energy-efficient GPU, at 72.7 times higher energy efficiency than the next lowest latency one.
Thinking on the edge: neuromorphic computing applications
Researchers may still be defining what neuromorphic computing is or the best ways to build brain-inspired circuits, says Syed, but they tend to agree that it’s well suited for edge applications — phones, self-driving cars, and other applications that can take advantage of fast, efficient AI inferencing with pre-trained models. A benefit of using PCM chips on the edge, Sebastian says, is that they can be exceptionally small, performant, and inexpensive.
Robotics applications could be well suited for brain-inspired computing, says Modha, as well as video analytics, in-store security cameras for example. Putting neuromorphic computing to work in edge applications could help solve problems of data privacy, says Bragaglia, as in-device inference chips would mean data doesn't need to be shuttled back and forth between devices, or to the cloud, to perform AI inferencing.
Whatever brain-inspired or neuromorphic processors end up coming out on top, researchers also agree that the current crop of AI models are too complicated to be run on classical CPUs or GPUs. There needs to be a new generation of circuits that can run these massive models.
“It’s a very exciting goal,” says Bragaglia. “It’s very hard, but it’s very exciting. And it’s in progress.”
https://www.ibm.com/account/reg/us-en/signup?formid=news-urx-51849