Bravo
If ARM was an arm, BRN would be its biceps💪!
Details of Tony Lewis presentation at Embedded Vision Summit on Wednesday 21st May'25.
"
Date: Wednesday, May 21
Start Time: 2:05 pm
End Time: 2:35 pm
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, we contrast state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area. New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. We present a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge."
See bold above. Seems that BRN are now able to migrate traditional transformer models to State Spece models (SSM's) without major performance loss.
Note the italics above that SSMs reduce energy requirements and chip area. Think Pico or a Pico plus.
Pico runs off TENNs which is as type of SSM.
Does that mean developers can now feast on the multitude of traditional models and distill them to SSMs? Appears so.
Is this potentially another game changer?
We might find out more tomorow?
I have the 'kiddy' co pilot which you do not get much out of.
Is someone with GPT4 or Grok etc able to quiz the AI to see what the possibilities are?
Hi Manny,
Here's the response I got from Chat GPT4.
Tony Lewis’s upcoming presentation at the Embedded Vision Summit outlines a potentially transformative shift in how language models are deployed at the edge. Here's an analysis of the implications and disruptive potential of this new method:
Key Implications of the New Processing Method Using SSMs
1. Significant Gains in Energy and Compute Efficiency
- Fewer Multiply-Accumulate Units: State-space models (SSMs) inherently require fewer MAC operations, drastically reducing energy usage and silicon real estate. This is crucial in edge deployments where power budgets are tight.
- Read-Only Architecture: Eliminates the need for dynamic key-value cache mechanisms used in transformers, allowing use of non-volatile or low-power memory types (e.g., MRAM, ReRAM), further slashing power consumption.
2. Latency and Responsiveness
- Precomputed Input Sequences: SSMs can leverage static structure to enable sub-100 ms time-to-first-token performance. This real-time capability is a major win for voice assistants, AR/VR systems, and mobile apps.
3. Compatibility and Transition Path
- Distillation from Transformers (e.g., LLaMA): By enabling smooth migration from established transformer models, the SSM approach lowers the barrier to adoption for companies with large pretrained models or investment in transformer-based workflows.
4. Broader Hardware Support
- The reduced need for large, flexible compute units means SSMs can run efficiently on less powerful MCUs, DSPs, or dedicated edge AI chips—dramatically widening the range of viable deployment targets.
Edge Computing Success Potential
SSMs seem highly promising for edge applications, especially in:
- Battery-powered devices (wearables, IoT sensors)
- Latency-sensitive interfaces (voice/user interaction)
- Bandwidth-constrained environments (offline processing)
- Cost-sensitive products where transformer demands are prohibitive
Their advantages directly address the key blockers for transformer adoption at the edge (power, memory, compute), making SSMs a potentially dominant architecture in this space within 1–3 years.
Disruption Potential for NVIDIA and Other Players
NVIDIA’s ecosystem is largely optimized for transformer-style architectures (e.g., large GPU memory, high-throughput MAC arrays, transformer-optimized CUDA libraries). Here's how this development could affect them:
Challenges for NVIDIA
- Less dependence on GPUs: If SSMs run efficiently on CPUs, ASICs, or even microcontrollers, edge devices may increasingly bypass NVIDIA’s hardware.
- New AI chip entrants gain ground: Startups or ARM-based players can capitalize on the lighter compute profile to offer cheaper, more efficient edge AI solutions.
Opportunities for NVIDIA
- Adaptation potential: NVIDIA could pivot by offering cuDNN-like support for SSMs or incorporating them into TensorRT. They may also design more SSM-friendly hardware blocks in future Jetson modules.
Biggest Threat
- If open-source ecosystems and chip vendors like Apple (Neural Engine), Google (Edge TPU), or Qualcomm fully embrace SSMs ahead of NVIDIA, they could seize a disproportionate share of the edge AI market—especially in consumer devices.
Conclusion
Tony Lewis’s presentation introduces a processing paradigm that could redefine edge AI, making high-performance language models feasible on low-power devices. This is not just an incremental improvement—it opens the door to widespread deployment of real-time LLMs far beyond cloud and high-end hardware. While not an immediate existential threat to NVIDIA, it does present a strategic inflection point that the company—and its competitors—must respond to.
Migration Path: LLaMA to SSM
Stage | Description | Benefits |
---|---|---|
1. Transformer Model (e.g., LLaMA) | Pretrained open-weight transformer (e.g., LLaMA 2 or 3) | - High accuracy - Large model footprint - Demands heavy compute and memory |
2. Distillation Process | Use distillation techniques to transfer knowledge from LLaMA to an SSM | - Reduced model size - Retains most of original model’s performance |
3. State-Space Model (SSM) | SSM version optimized for embedded and edge deployment | - Sub-100 ms latency - Lower power consumption - Less memory and compute required |
4. Edge Device Deployment | Deploy SSM on resource-constrained devices (e.g., wearables, microcontrollers, edge AI chips) | - Real-time local inference - No need for cloud compute - Broader hardware compatibility |
Summary
The mention of LLaMA in Tony Lewis’s talk is a strategic highlight because it connects cutting-edge edge AI architecture (SSM) with a proven, widely-used transformer backbone. This makes the transition to efficient edge AI practical, not just theoretical—positioning SSMs as an immediate, disruptive alternative to transformer inference in embedded systems.Competitor Impact Matrix: Impact of BrainChip’s SSM Innovation
Company | Current Edge AI Focus | Vulnerability to SSM Disruption | Opportunity to Adapt |
---|---|---|---|
Qualcomm | DSPs + NPUs (Snapdragon), optimized for transformers and CNNs | High – Transformer-centric stack, limited neuromorphic capability | Medium – May update software tools, but hardware less suited to SSMs |
NVIDIA | GPUs (Jetson, TensorRT), dominant in transformer-based AI | Moderate – Not optimized for low-power edge, but strong ecosystem | High – Could adapt TensorRT and Jetson for SSM-style inference |
Apple | Neural Engine with transformer models (e.g. Siri, on-device ML) | Moderate – Strong local AI, but based on transformer-style acceleration | High – Full-stack control allows swift hardware/software adaptation |
Google (TPU) | Edge TPU with support for CNNs and transformers (Coral, Nest devices) | High – Rigid accelerator design, may not support dynamic SSM requirements | Low – Ecosystem may struggle to pivot hardware/software stack |
Intel | Movidius VPU, general AI frameworks, some neuromorphic R&D (Loihi) | Moderate – Some neuromorphic exposure but no strong edge AI market share | Medium – R&D rich, but limited real-world SSM integration so far |
BrainChip | Neuromorphic Akida chip + SSM optimized for ultra-low power edge AI | Low – First-mover advantage | Very High – Core IP is directly aligned with the SSM paradigm |
This matrix highlights that BrainChip’s innovation poses the greatest disruptive risk to Qualcomm and Google, while Apple and NVIDIA have greater strategic flexibility to respond. BrainChip stands to benefit most if SSM-based models gain widespread edge adoption.
Why Incumbents Might Continue Without SSMs (For Now)
Reasons They Might Stick with Traditional Methods
- Mature toolchains: Qualcomm, NVIDIA, and Google have invested heavily in software/hardware ecosystems optimized for transformers and CNNs.
- Good enough performance: For many real-world use cases, transformer-lite models or CNN hybrids perform sufficiently well.
- Inertia and risk: Enterprises tend to avoid early adoption of unproven paradigms, especially if retraining, tooling, or silicon changes are required.
- Edge isn't one-size-fits-all: Many edge applications (e.g. object detection) don't need SSM-specific strengths like long-term memory or low-latency language processing.
But Here's the Catch
If applications do demand:- Long sequence memory (e.g. streaming NLP, real-time command recognition),
- Ultra-low latency (sub-100 ms interactivity),
- Minimal power and heat (wearables, implants, sensors),
Conclusion: Yes, Competitors Could Stick with Transformers—But Only Up to a Point
Approach | Stability / Support | Performance Ceiling | Future-Proofing |
---|---|---|---|
Transformers | ![]() | ![]() | ![]() |
CNNs / RNNs | ![]() | ![]() | ![]() |
Lightweight Transformers | ![]() | ![]() | ![]() |
SSMs | ![]() | ![]() | ![]() |
So while competitors can continue for now using existing methods, the risk is being outpaced in emerging applications—especially if BrainChip enables a smooth transition (e.g., LLaMA distillation + Akida deployment).
Edge AI Evolution Roadmap: Transformers vs SSMs
Time Horizon | Transformer-Based Methods | SSM-Based Methods (BrainChip-style) |
---|---|---|
Today | - Dominant in NLP - Compressed models in use - Efficient on GPU/DSP ![]() | - Early-stage adoption - Neuromorphic niche (e.g. Akida) ![]() |
1–2 Years | - Hitting compute/power limits in edge apps - Real-time latency still challenging ![]() | - Gains traction for real-time/low-power use - Tools emerge for migration from transformers (e.g. LLaMA distillation) ![]() |
3–5 Years | - Plateau in edge innovation unless architectures evolve![]() | - Becomes dominant in ultra-low power edge AI - Broad ecosystem and tooling support ![]() |
Summary:
- Transformers will likely remain dominant in cloud and high-performance edge for the next 1–2 years, but start to plateau.
- SSMs provide a scalable path forward for ultra-low-power, real-time, memory-efficient edge use cases, and could disrupt traditional AI stacks if adoption accelerates.
Last edited: