Hi Manny,
Here's the response I got from Chat GPT4.
Tony Lewis’s upcoming presentation at the Embedded Vision Summit outlines a potentially transformative shift in how language models are deployed at the edge. Here's an analysis of the implications and disruptive potential of this new method:
Key Implications of the New Processing Method Using SSMs
1. Significant Gains in Energy and Compute Efficiency
- Fewer Multiply-Accumulate Units: State-space models (SSMs) inherently require fewer MAC operations, drastically reducing energy usage and silicon real estate. This is crucial in edge deployments where power budgets are tight.
- Read-Only Architecture: Eliminates the need for dynamic key-value cache mechanisms used in transformers, allowing use of non-volatile or low-power memory types (e.g., MRAM, ReRAM), further slashing power consumption.
2. Latency and Responsiveness
- Precomputed Input Sequences: SSMs can leverage static structure to enable sub-100 ms time-to-first-token performance. This real-time capability is a major win for voice assistants, AR/VR systems, and mobile apps.
3. Compatibility and Transition Path
- Distillation from Transformers (e.g., LLaMA): By enabling smooth migration from established transformer models, the SSM approach lowers the barrier to adoption for companies with large pretrained models or investment in transformer-based workflows.
4. Broader Hardware Support
- The reduced need for large, flexible compute units means SSMs can run efficiently on less powerful MCUs, DSPs, or dedicated edge AI chips—dramatically widening the range of viable deployment targets.
Edge Computing Success Potential
SSMs seem
highly promising for edge applications, especially in:
- Battery-powered devices (wearables, IoT sensors)
- Latency-sensitive interfaces (voice/user interaction)
- Bandwidth-constrained environments (offline processing)
- Cost-sensitive products where transformer demands are prohibitive
Their advantages directly address the key blockers for transformer adoption at the edge (power, memory, compute), making SSMs a potentially dominant architecture in this space within 1–3 years.
Disruption Potential for NVIDIA and Other Players
NVIDIA’s ecosystem is largely optimized for transformer-style architectures (e.g., large GPU memory, high-throughput MAC arrays, transformer-optimized CUDA libraries). Here's how this development could affect them:
Challenges for NVIDIA
- Less dependence on GPUs: If SSMs run efficiently on CPUs, ASICs, or even microcontrollers, edge devices may increasingly bypass NVIDIA’s hardware.
- New AI chip entrants gain ground: Startups or ARM-based players can capitalize on the lighter compute profile to offer cheaper, more efficient edge AI solutions.
Opportunities for NVIDIA
- Adaptation potential: NVIDIA could pivot by offering cuDNN-like support for SSMs or incorporating them into TensorRT. They may also design more SSM-friendly hardware blocks in future Jetson modules.
Biggest Threat
- If open-source ecosystems and chip vendors like Apple (Neural Engine), Google (Edge TPU), or Qualcomm fully embrace SSMs ahead of NVIDIA, they could seize a disproportionate share of the edge AI market—especially in consumer devices.
Conclusion
Tony Lewis’s presentation introduces a processing paradigm that could
redefine edge AI, making high-performance language models feasible on low-power devices. This is
not just an incremental improvement—it opens the door to widespread deployment of real-time LLMs far beyond cloud and high-end hardware. While not an immediate existential threat to NVIDIA, it does present a strategic inflection point that the company—and its competitors—must respond to.