Details of Tony Lewis presentation at Embedded Vision Summit on Wednesday 21st May'25.
"
Date: Wednesday, May 21
Start Time: 2:05 pm
End Time: 2:35 pm
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, we contrast state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area. New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. We present a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge."
See bold above. Seems that BRN are now able to migrate traditional transformer models to State Spece models (SSM's) without major performance loss.
Note the italics above that SSMs reduce energy requirements and chip area. Think Pico or a Pico plus.
Pico runs off TENNs which is as type of SSM.
Does that mean developers can now feast on the multitude of traditional models and distill them to SSMs? Appears so.
Is this potentially another game changer?
We might find out more tomorow?
I have the 'kiddy' co pilot which you do not get much out of.
Is someone with GPT4 or Grok etc able to quiz the AI to see what the possibilities are?
"
Date: Wednesday, May 21
Start Time: 2:05 pm
End Time: 2:35 pm
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, we contrast state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area. New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. We present a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge."
See bold above. Seems that BRN are now able to migrate traditional transformer models to State Spece models (SSM's) without major performance loss.
Note the italics above that SSMs reduce energy requirements and chip area. Think Pico or a Pico plus.
Pico runs off TENNs which is as type of SSM.
Does that mean developers can now feast on the multitude of traditional models and distill them to SSMs? Appears so.
Is this potentially another game changer?
We might find out more tomorow?
I have the 'kiddy' co pilot which you do not get much out of.
Is someone with GPT4 or Grok etc able to quiz the AI to see what the possibilities are?