FORBESINNOVATIONAI
using TENNs for object detection can provide a 5x to 30x reduction in operations using 1/50th the number of parameters with equal or better accuracy than traditional CNNs
www.forbes.com
Real-Time AI At The Edge May Require A New Network Solution
Jim McGregor
Contributor
Tirias ResearchContributor Group
Follow
0
Aug 2, 2024,11:10am EDT
The complexity of edge networks
BRAINCHIP
AI has the power to change every electronic platform, but what works in the data center may not work in an industrial edge platform like a security camera, robotic arm, or even a vehicle. There is no one-size-fits-all for edge AI because of space, power, data security, and performance-latency requirements, which means there is not one solution for all AI applications. Transitioning to edge AI requires new solutions, especially for on-device training.
The Current AI Model
Most AI models start in the data center with the training of a neural network model based on public and/or private data. As we have argued in the past, the expense of running AI in the cloud can be prohibitive from a cost, latency, and security standpoint. As a result, neural network models can be optimized by shrinking the model size to then run at the edge of the network on the device, commonly referred to as “edge computing.” This optimization is a delicate balance of reducing the size of the model while maintaining acceptable accuracy.
While neural network models and optimization techniques improve, using a scaled-down data center solution may not be optimal for many edge applications. Edge applications are often focused on the input of sensor data and require even smaller models with a high degree of accuracy and real-time, or close to real-time, execution for mission critical or even life-threatening situations. These edge applications may include healthcare, automotive/transportation, manufacturing, and security.
Event-driven AI
As an alternative, BrainChip has developed its Akida neuromorphic IP (intellectual property) solutions to support Temporal Event-based Neural Networks (TENNs), an event-based neural processing network architecture, in addition to traditional Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs) and Transformer based neural networks. What this means is that a temporal (time) enabled neural network (TENN), or networks, is only operating during the time when a trigger event or input occurs. During other times, it is not computing and therefore not consuming much power. This can translate to higher performance, adaptability and lower real-time latency per event, input, or request and at a fraction of the power consumption of other AI solutions.
According to BrainChip, TENNs are ideal for processing various types of data such as one-dimensional time series and spatial-temporal data. TENNs is showing positive results in a number common applications, such as audio denoising, eye-tracking for AR/VR, health data monitoring (heart rate, SpO2), keyword spotting, Small Language Models (SLMs), and video object detection. TENNs ability to adapt to new input/events overcome some of the limitations of traditional neural networks while supporting future neural network models at a fraction of the die area and operating cost.
The Akida technology is available as IP cores that can be integrated into anything from a low-power microcontroller to a high-performance applications processor or system-on-chip (SoC) or discrete accelerator chips. While using an IP solution would require a new design, it provides more efficient processing of TENNs models by taking advantage of sparsity, a key feature that makes the human brain so efficient. Additionally, there are no other off-the-shelf solutions that can provide comparable low-power, high-accuracy, and real-time AI processing.
Edge-Specific AI
While TOPS (Trillions of Operations Per Second) seem to be getting a lot of attention as a means of measuring the AI performance of a chip, analyzing the performance of Edge AI is better accomplished using metrics that are more specific to the application. Examples of these more application specific metrics include measuring the efficiency of processing frames or frames per second (FPS) for image processing, time to first token (TTFT) for response time, mean average precision (mAP) for accuracy, and Watts or Kilowatts for power consumption. BrainChip has provided some data along these lines to demonstrate the value of TENNs in various applications.
In another demonstration, the company demonstrated how TENNs can be used to drastically reduce the training time by and the power consumed by more than orders of magnitude relative to other large language data sets like GPT-2, which would be more appropriate for embedded applications than the newer models, with equivalent accuracy.
Edge Specific Solutions
While there is a rush to push AI to every platform and device, scaling down from the data center may not be the best solution for many applications. As we have seen in the past, the unique requirements of devices often drive innovation in new directions, Tirias Research believes the same will hold true for AI as it moves from the datacenter to the edge. But, as with any new technology, success often depends on the benefit over existing solutions. According to BrainChip, the numbers can be very significant, with demonstrations showing up to a 50x reduction in the number of model parameters, up to a 30x reduction in training time, and 5000x reduction in multiple-accumulate (MAC) operations with the same or better accuracy. Improvements in performance and power efficiency scale with model efficiency.
Edge Specific Solutions
While there is a rush to push AI to every platform and device, scaling down from the data center may not be the best solution for many applications. As we have seen in the past, the unique requirements of devices often drive innovation in new directions, Tirias Research believes the same will hold true for AI as it moves from the datacenter to the edge. But, as with any new technology, success often depends on the benefit over existing solutions. According to BrainChip, the numbers can be very significant, with demonstrations showing up to a 50x reduction in the number of model parameters, up to a 30x reduction in training time, and 5000x reduction in multiple-accumulate (MAC) operations with the same or better accuracy. Improvements in performance and power efficiency scale with model efficiency.