BRN Discussion Ongoing

JDelekto

Regular
Some more on Qualcomm's Hexagon AI which is pretty promiscupos, sharing the AI workload selectively between GPU, CPU, and NPU:


6 Heterogeneous computing:

Leveraging all the processors for generative AI Generative AI models suitable for on-device execution are becoming more complex and trending toward larger sizes, from one billion to 10 billion to 70 billion parameters. They are increasingly multi-modal, meaning that they can take in multiple inputs — such as text, speech, or images — and produce several outputs.

Further, many use cases concurrently run multiple models. For example, a personal assistant application uses voice for input and output. This requires running an automatic speech recognition (ASR) model for voice to text, an LLM for text to text, and a text-to-speech (TTS) model for a voice output. The complexity, concurrency, and diversity of generative AI workloads require harnessing the capabilities of all the processors in an SoC. An optimal solution entails:

1. Scaling generative AI processing across cores of a processor and across processors

2. Mapping generative AI models and use cases to one or more cores and processors

Choosing the right processor depends on many factors, including use case, device type, device tier, development time, key performance indicators (KPIs), and developer expertise. Many tradeoffs drive decisions, and the target KPI could be power, performance, latency, or accessibility for different use cases. For example, an original equipment manufacturer (OEM) making an app for multiple devices across categories and tiers will need to choose the best processor to run an AI model based on SoC specs, end-product capabilities, ease of development, cost, and graceful degradation of the app across device tiers.

As previously mentioned, most generative AI use cases can be categorized into on-demand, sustained, or pervasive. For on-demand applications, latency is the KPI since users do not want to wait. When these applications use small models, the CPU is usually the right choice. When models get bigger (e.g., billions of parameters), the GPU and NPU tend to be more appropriate. For sustained and pervasive use cases, in which battery life is vital and power efficiency is the critical factor, the NPU is the best option.

Another key distinction is identifying whether the AI model is memory bound — performance is limited by memory bandwidth — or compute bound — performance is limited by the speed of the processor. Today’s LLMs are memory bound for the text generation, so focusing on memory efficiency on the CPU, GPU, or NPU is appropriate. For LVMs, which could be compute or memory bound, the GPU or NPU could be used, but the NPU provides the best performance per watt.

A personal assistant that offers a natural voice user interface (UI) to improve productivity and enhance user experiences is expected to be a popular generative AI application. The speech recognition, LLM, and speech models must all run with some concurrency, so it is desirable to split the models between the NPU, GPU, CPU, and the sensor processor. For PCs, agents are expected to run pervasively (always-on), so as much of it as possible should run on the NPU for performance and power efficiency
.

As we know, Akida can multi-task, running different models on the one SoC.
As I understand it, the NPU (which is easily confused for neuromorphic because of the 'N') is another dedicated processor that parallelizes computational operations with the CPU and GPU at lower power requirements.

Qualcomm's Hexagon NPU is designed to offload those computations from the other two and is optimized for vector, matrix, and tensor processing (basically a lot of matrix math). I ran across an interesting thread on Hacker News, where someone benchmarked the NPU and found that it was not quite as good as the CPU itself. Again, the NPU intends to achieve performance through parallelization at lower power. They have the code for their benchmarks here on GitHub.

I believe Akida could still be a strong competitor to Qualcomm's AI offerings, or even potentially a replacement for Hexagon for better real-time processing on power-constrained devices.
 
  • Like
  • Fire
Reactions: 13 users

TECH

Regular



Like I say, I personally don't see Intel derailing our company anytime soon, do you Donald Duck ?

We're targeting the Edge, Intel is targeting the Edge of a Cliff...:ROFLMAO::ROFLMAO::ROFLMAO:

 
  • Like
  • Haha
  • Fire
Reactions: 5 users

Diogenese

Top 20
Some more on Qualcomm's Hexagon AI which is pretty promiscuous, sharing the AI workload selectively between GPU, CPU, and NPU:


6 Heterogeneous computing:

Leveraging all the processors for generative AI Generative AI models suitable for on-device execution are becoming more complex and trending toward larger sizes, from one billion to 10 billion to 70 billion parameters. They are increasingly multi-modal, meaning that they can take in multiple inputs — such as text, speech, or images — and produce several outputs.

Further, many use cases concurrently run multiple models. For example, a personal assistant application uses voice for input and output. This requires running an automatic speech recognition (ASR) model for voice to text, an LLM for text to text, and a text-to-speech (TTS) model for a voice output. The complexity, concurrency, and diversity of generative AI workloads require harnessing the capabilities of all the processors in an SoC. An optimal solution entails:

1. Scaling generative AI processing across cores of a processor and across processors

2. Mapping generative AI models and use cases to one or more cores and processors

Choosing the right processor depends on many factors, including use case, device type, device tier, development time, key performance indicators (KPIs), and developer expertise. Many tradeoffs drive decisions, and the target KPI could be power, performance, latency, or accessibility for different use cases. For example, an original equipment manufacturer (OEM) making an app for multiple devices across categories and tiers will need to choose the best processor to run an AI model based on SoC specs, end-product capabilities, ease of development, cost, and graceful degradation of the app across device tiers.

As previously mentioned, most generative AI use cases can be categorized into on-demand, sustained, or pervasive. For on-demand applications, latency is the KPI since users do not want to wait. When these applications use small models, the CPU is usually the right choice. When models get bigger (e.g., billions of parameters), the GPU and NPU tend to be more appropriate. For sustained and pervasive use cases, in which battery life is vital and power efficiency is the critical factor, the NPU is the best option.

Another key distinction is identifying whether the AI model is memory bound — performance is limited by memory bandwidth — or compute bound — performance is limited by the speed of the processor. Today’s LLMs are memory bound for the text generation, so focusing on memory efficiency on the CPU, GPU, or NPU is appropriate. For LVMs, which could be compute or memory bound, the GPU or NPU could be used, but the NPU provides the best performance per watt.

A personal assistant that offers a natural voice user interface (UI) to improve productivity and enhance user experiences is expected to be a popular generative AI application. The speech recognition, LLM, and speech models must all run with some concurrency, so it is desirable to split the models between the NPU, GPU, CPU, and the sensor processor. For PCs, agents are expected to run pervasively (always-on), so as much of it as possible should run on the NPU for performance and power efficiency
.

As we know, Akida can multi-task, running different models on the one SoC.

Running AI on CPU or GPU necessarily entails the use of software.
When you find yourself in a hole ... keep digging:

So Qualcomm's Hexagon NPU evolved from a DSP:

Building our NPU from a DSP architecture was the right choice for improved programmability and the ability to tightly control scalar, vector, and tensor operations that are inherent to AI processing. Our design approach of optimized scalar, vector, and tensor acceleration combined with large local shared memory, dedicated power delivery systems, and other hardware acceleration differentiates our solution. Our NPU mimics the neural network layers and operations of the most popular models, such as convolutions, fully-connected layers, transformers, and popular activation functions, to deliver sustained high performance at low power.

So naturally there's a side tunnel to DSPs:


Such performance improvements have led to the introduction of digital signal processing in commercial communications satellites where hundreds or even thousands of analog filters, switches, frequency converters and so on are required to receive and process the uplinked signals and ready them for downlinking, and can be replaced with specialised DSPs with significant benefits to the satellites' weight, power consumption, complexity/cost of construction, reliability and flexibility of operation. For example, the SES-12 and SES-14 satellites from operator SES launched in 2018, were both built by Airbus Defence and Space with 25% of capacity using DSP.[6]


I wonder if Airbus intends to swap its DSPs for SNNs?
 
  • Like
  • Love
  • Fire
Reactions: 7 users

Diogenese

Top 20
As I understand it, the NPU (which is easily confused for neuromorphic because of the 'N') is another dedicated processor that parallelizes computational operations with the CPU and GPU at lower power requirements.

Qualcomm's Hexagon NPU is designed to offload those computations from the other two and is optimized for vector, matrix, and tensor processing (basically a lot of matrix math). I ran across an interesting thread on Hacker News, where someone benchmarked the NPU and found that it was not quite as good as the CPU itself. Again, the NPU intends to achieve performance through parallelization at lower power. They have the code for their benchmarks here on GitHub.

I believe Akida could still be a strong competitor to Qualcomm's AI offerings, or even potentially a replacement for Hexagon for better real-time processing on power-constrained devices.
Yes. The Qualcomm white paper above states that their NPU evolved from the DSP, and it looks like they are sticking with it.

"You know, I reckon if we put a supercharger on this Model T, it'll be as good as the rest."
 
  • Like
  • Haha
  • Fire
Reactions: 9 users

Getupthere

Regular

While Apple is late to generative AI, it continues to innovate with other techniques that contribute to the iPhone experience. You’ll find Apple’s Neural Engine built into the Axx chipsets, which allow for on-device language processing, image recognition, and data processing through machine learning.


When there is so much talk of moving personal data into the cloud or processing (as well as the intense energy demands of generative AI), many will look at Apple’s efforts to keep data on the device and find it a more personable decision.
 
  • Fire
  • Like
Reactions: 2 users

Frangipani

Regular

F8DF4389-80D8-45BE-B567-48AE924827A0.jpeg
 
  • Like
Reactions: 2 users
Top Bottom