Running More Efficient AI/ML Code With Neuromorphic Engines
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.
MAY 16TH, 2024 - BY:
KAREN HEYMAN
Neuromorphic engineering is finally getting closer to market reality, propelled by the AI/ML-driven need for low-power, high-performance solutions.
Whether current initiatives result in true
neuromorphic devices, or whether devices will be inspired by neuromorphic concepts, remains to be seen. But academic and industry researchers continue to experiment in the hopes of achieving significant improvements in computational performance using less energy.
Neuromorphic engineering has been a topic of academic discussions, industry research, and start-up promises for years. Like
AI, “neuromorphic” is both a scientific term and a marketing buzzword, so it’s easy to get confused over what legitimately qualifies as a neuromorphic design. The generally accepted definition is that neuromorphic refers to parallel, asynchronous designs that don’t separate memory from logic, and thus more closely mimic the structure and function of the human brain than the traditional
von Neumann architecture. The goal is to create devices that can boost performance using less energy by streamlining the architecture and incorporating more parallelism. Originally, that meant any processing, but lately neuromorphic computing is being looked to as the ideal hardware environment for AI/ML applications.
“It’s brain-inspired computing, essentially, and what the brain is good at is performing computations efficiently,” said Tony Chan Carusone, CTO at
Alphawave Semi. “The way it does that is to exploit parallelism more than to exploit high-speed processing. By running lots of logic at lower speed, you can actually perform more computations per joule of energy consumed. You’re essentially better off doing lots of computations in parallel at lower speed.”
Yole Group analysts Florian Domengie, senior technology and market analyst, imaging, and Adrien Sanchez, senior technology and market analyst, computing, broke down various uses of the term neuromorphic:
- Neuromorphic is a closed compound word, resulting from the combination of the Greek words neuro- (“nerve”) and -morphḗ (“form”).
- Neuromorphic System refers to all systems that mimic the neurobiological architectures of the nervous system.
- Neuromorphic engineering is a concept developed by Caltech Professor Carver Mead in the late 1980s after an encounter with Nobel-laureate biophysicist Max Delbrück [who introduced him to neurobiology]. The term describes the use of integrated circuits to mimic neuro-biological architectures in the nervous system.
- Neuromorphic sensing originates from the development of a “silicon retina” by Misha Mahowald, a former Ph.D. student in Carver Mead’s group at Caltech, in the early 1990s at the Institute of Neuroinformatics and ETH Zurich. [Mead credits her as a co-pioneer. The Mahowald Prize is named in her honor.]
- Neuromorphic computing refers to all hardware and software systems that significantly mimic the working principles of the biological brain, which consists of neurons and synapses.
Neuromorphic sensing is expected to help achieve better accuracy and less latency in edge and IoT devices. According to Yole, in its
Neuromorphic Computing, Memory and Sensing 2024 report, neuromorphic sensing and computing are on the brink of substantial expansion.
The firm predicts the neuromorphic computing market will rise to $412 million by 2029, and $5.4 billion by 2034. Similarly, they project the neuromorphic sensing market will achieve $410 million, escalating to $2.9 billion by 2034.
Fig. 1: The upward trend of neuromorphic-based computing. Source: Yole Intelligence Neuromorphic Computing, Memory and Sensing 2024 report
The past of the future
Questions have been surfacing since the dawn of computing about why biological brains are so much more efficient than digital computers. A fruit fly has a brain so small that it can only be seen with a high-powered microscope. Yet, as computational neuroscientist Terry Sejnowski of the Salk Institute observed, a fly processes information more efficiently than a supercomputer. “[It] has a brain with only 100,000 neurons; it weighs a milligram and consumes a milliwatt of power,” Sejnowski wrote in
The Deep Learning Revolution.
By contrast to the energy optimization in biological brains, the generally accepted estimate is that today’s data centers consume roughly 1% of the entire world’s energy, a figure that could quadruple in the next decade. There is some hope. The authors of a 2020 Science
paper said the 1% prediction overlooks “strong countervailing energy efficiency trends,” one of which could be the establishment of commercial neuromorphic computing devices.
Initially, the concept of neuromorphic engineering was met with patronizing skepticism as an intriguing notion that would never see fruition. Yet, its goals and other ideas discussed at computational neuroscience conferences now underpin many of the breakthroughs in AI/ML. Today, neuromorphic engineering is considered by different practitioners to be either a subset of computational neuroscience, a parallel field, or a completely separate discipline. In any case, neuromorphic engineering has taken off as a serious
academic pursuit, with an extensive breadth, as exemplified in a
2022 roadmap. Still, anyone planning to pursue neuromorphic research should be aware of the overlap between the two fields, which occasionally can lead to confusion when similar terms may express subtly different concepts, depending on the implementations.
In 1989, Caltech professor Carver Mead and collaborators defined the field of neuromorphic engineering in
Analog VLSI and Neural Systems. While much of the research has been superseded, the fundamental ideas still hold. Thirty-five years later, Gert Cauwenberghs, a student of Mead’s at Caltech and now professor of bioengineering at the University of California San Diego, still recommends the book. “Carver defined neuromorphic engineering in terms of fundamental thermodynamics,” said Cauwenberghs. “He was making comparisons between how ions move through ion channels in biology, versus how electron holes move through channels of transistors. Even though they’re totally different, because one is wet and one is solid, and they have very different timescales, the physics is actually amazingly similar. For example, you have the same kind of exponential dependence of charge density, or conductance versus voltage, which defines the gain of a transistor or the conductance in an ion channel within a membrane.”
Mead also proposed that the transistor, which until that time had been mostly used as a digital switch, also could be used as a multiplier. “As a switch, it’s zero or one, on or off. But if you have, say, at least a hundred, you can implement a multiplier, because just based on the innate physics of the device you can do computation,” Cauwenberghs explained. “But if you want to implement that logic in a computer, that takes a lot more hardware. It’s precise, but it’s very wasteful, so it makes sense to start building circuits that are more inspired by how the brain works in a physical way. It tells us to listen to the physics, listen to the electrons, because they have a lot to say.”
Of note, neuromorphic engineering implementations often involve compute-in-memory designs, such as the
NeuRAMM chip, which Cauwenberghs works on, in collaboration with Weier Wan and Stanford’s H.-S. Philip Wong.
Signaling and sparse coding
Ironically, while the brain analogy makes sense to physicists and engineers, it can flummox neurobiologists, who know that the brain operates by both electrical and chemical signaling. In the most basic outline, branch-like dendrites send input signals to the neuron’s cell body (soma) until they hit threshold. The neuron then “spikes” and sends action potentials out through axons into the synapse, which relies on a complex chemistry to pass those signals down to the next set of dendrites as the process repeats throughout the brain.
“It’s a bio-electrical interaction,” said Bob Bleacher, vice president of product at Untether AI. “You have these small amounts of energy that transfer between the synapses that cause chemical reactions. And the amount and how the chemicals are going on in the synapses determines whether that signal gets amplified or it gets attenuated, and then it gets transferred and goes off.”
In real life, this process is many orders of magnitude more complex, including dendrites which themselves can compute. Nevertheless, this input/threshold/output triad underlies much of the thinking in artificial neural networks.
“The heavy lifting in the brain is done in the dendritic trees of neurons, and it’s a combinatorial thing,” said Mead. “It’s exponential in the size of the dendritic tree, which is in the thousands. People wonder why neural computation is so efficient. It’s efficient because it’s a combinatoric tree with 1,000 inputs.” Looking at engineering from a dendritic point of view could potentially help reduce thermal challenges, according to a
paper Mead recommends by his former graduate student, Kwabena Boahen, now director of the
Brains in Silicon Lab at Stanford.
Any first-year neuroscience student knows about spikes, but at the graduate level, computational neuroscientists see them as elements of neural codes that underlie sensory processing, cognition, and locomotion. Neural coding research has influenced much of what is now being put into hardware as neuromorphic computing. In fact, the phrase “spike-based computing,” is becoming a popular way to describe neuromorphic computing. It’s a definition a neurobiologist can feel comfortable with, given that it acknowledges there’s no chemical signaling in computer hardware.
For years, the computational neuroscientists have talked about “
sparse codes,” in which only a few neurons needed to be activated to represent a sensation or object in the brain. The same scheme is used in one of the most well-known neuromorphic projects, Intel’s Loihi, to help lower power requirements.
“The potential for sparse coding is huge because it is energy efficient,” said Bleacher. “Because rather than using, for example, integer and floating point values and a bunch of memory and memory transfers, and multiply/accumulate functions and other types of compute, the holy grail of the neuromorphic computing is to use minuscule amounts of energy to represent this spike. Then the neuron itself – what I’m using as the filtering function – can be a minimal amount of energy, because I’m not doing floating point math. I’m doing passing through, either amplifying or attenuating a spike.”
Using the term “sparse distributed asynchronous communication,” Mike Davies, director of the Neuromorphic Computing Lab at Intel,
explained how a neuromorphic architecture treats the problem of communicating the activations of a set of neurons to a downstream neuron. “If somehow only the most strongly activated neurons could asynchronously announce themselves in time, then the most important inputs can be processed quickly with minimal latency,” he said. “On the other hand, if the entire set of input neurons has to be processed as one dense matrix, which is the standard way in today’s mainstream architectures, then the important inputs will get congested and delayed behind the herd of less important inputs. This may seem obvious, but to truly exploit this principle, a hardware implementation needs to respond to unpredictable neuron activations immediately, which implies extremely fine-grained parallelism and neurons that asynchronously represent their activations as events in time rather than as numbers in a synchronously processed vector. This is what biological neurons do using spikes, and it’s what we aim to implement in neuromorphic chips.”
While spike timing looks at the firing of individual neurons or the synchronous firing of groups of neurons as coded signals, the brain as a whole does not have a global clock. Thus, Davies’ experience with asynchronous computing underlies the Loihi project. “It’s no surprise that the brain is using asynchronous communication,” he said. “Synchronous communication comes at a cost on some level. It has a great benefit in terms of very deterministic, predictable performance properties, and it eases the design problem in some ways. That’s why it’s come to dominate the traditional chip design space. There are some really interesting advantages that come from asynchronous signaling and processing, and the brain, through evolution, has arrived at using that method. Because of our team’s background in asynchronous communication, we brought that as one of those tools that we put on our list of neuromorphic novelties that that we have in Loihi.”
Computational neuroscientists also have discovered that neurons can fire in synchronous waves, and that biological brains can use these
synchronous oscillations to fine-grade neural codes, an insight that might one day influence neuromorphic engineering.
“Oscillations are pervasive in the brain. That’s not to say that neurons are synchronized. They’re not all operating lockstep on a millisecond interval,” said Davies. “But they’re asynchronously communicating, and then you get emergent synchronization at different timescales and that’s performing interesting signal processing.”
Despite the novelty of neuromorphic designs, Davies said Intel is firmly grounded in industry realities, with their Loihi R&D focused on traditional methodologies that could eventually scale to high volume manufacturing. Recently, to demonstrate computational efficiencies on mainstream AI workloads, Intel announced
Hala Point, a research system with over 1,000 Loihi chips.
SpiNNaker
A decade ago,
Steve Furber, who made his name developing the Arm core, took on neuromorphic computing. As part of the Human Brain Project (HBP), he developed the large-scale neuromorphic computing platforms SpiNNaker. As Furber
told the HBP, “SpiNNaker has small embedded processors [which] are connected through a bespoke interconnect fabric which allows them to achieve the very high degree of connectivity found between neurons in the brain.”
SpiNNaker is now being further developed by the start-up SpiNNcloud, which recently introduced
SpiNNaker2, in which each chip is a low-power mesh of 152 Arm-based cores. The goal is to reduce the power required to run AI models.
There are other projects in various phases, including IBM’s
True North.
However, just working on current AI/ML implementations can be daunting, noted Chris Mueth, business development, marketing, and technical specialist, at
Keysight. “Most engineers do not have enough specialization in data science to understand how to set up ML training for open-ended neuromorphic use cases. To mitigate this, the problem needs to be bounded by a specific application wrapper that the engineer is familiar with.”
It also needs to be approached from a system standpoint. “Neuromorphic means that it’s brain-like, so that’s a system solution,” said Ron Lowman of Synopsys. “We sell a lot of components that optimize AI/ML systems, but when it comes to the whole solution, it’s not just a piece of hardware. It’s not just a chip. It’s actually the entire process of the inputs and the results. All of that requires a huge amount of engineering work, which requires a huge amount of optimization from the algorithm perspective. The expertise is in so many different facets, that it takes a huge engineering community that’s developed over the years to bring together even a simple solution.”
So, despite all the R&D excitement, it’s hard to overlook the bottom line. “The whole idea of neuromorphic engineering is fascinating, and has been around for 20+ years, but has never really broken out of the ‘research’ realm,” said Steve Roddy, chief marketing officer at
Quadric. “At this point in the maturation of the AI/ML market, one has to wonder how a neuromorphic solution – even if it could demonstrate gargantuan energy efficiency gains – could compete with the sheer volume of networks, code, programmer familiarity with conventional neural nets (i.e., CNNs, transformers, etc.) running on sequential machines, both von Neumann as well as systolic array types. For example, look at the tens of billions of dollars being poured into LLM development, trained on GPUs in the cloud, deployed on GPUs, CPUs, GPNPUs, or DSPs in device. Are the behemoth companies going to throw all that investment away to switch to spiking neural networks – with different underlying math – to be able to run large language models on a more energy efficient neuromorphic machine?”
Still, as he enters his ninth decade, Mead remains as optimistic today as he was decades ago about how far neuromorphic research can go. “Dendritic computation is partly analog, partly digital. It’s analog in timing and digital in outcome.
I tried and tried to build one like, but wasn’t close. As far as I know, nobody has been able to make one that does what the dendritic trees of neurons do, and that’s where the big exponential comes from in neural computation. And it’s not been realized. It’s done very power efficiently in the brain. And that’s why the brain can do exponentially more than we know how to do with much better devices. But we don’t know how to do it yet. Somebody’s going to figure out how to do that. It’s not beyond the physics we know.”
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.
semiengineering.com
In relation to this last comment about "dendritic computation", isn't that similar to what Peter has been working on with cortical columns?
This article says "Somebody’s going to figure out how to do that. It’s not beyond the physics we know.”
What if the person who figures it out is Peter Van Der Made?