FF
Just in case you think the above is just an academic oddity both Mercedes Benz and Nissan have been working on this exact same idea for quite some time and in fact it features as part of Mercedes Benz's future vehicle:
https://www.bitbrain.com/blog/nissan-brain-to-vehicle-technology
Now perhaps it is a big leap but as Mercedes Benz is working with Brainchip and this paper confirms the benefits of using Brainchip's technology for this purpose. Not to mention the Onsor Nexa glasses as further proof of AKIDA's claim to fame in this area. It would be a brave person who would discount that this is one of the many areas Mercedes Benz is looking at with Brainchip particularly now that AKIDA 2.0 TENNS is fully in play. Particularly as we know thanks to
@FullMoonFever finding and sharing the presentation by the Neuromorphic research arm at Mercedes Benz that it is aware of not just AKD1000 but also AKIDA 2.0
Hi SS,
Most of the following is speculation.
When we had the big MB reveal (2022?) they would have been using COTS Akida 1 chip. The neurons in this were designed using an amazing SNN architecture which was far superior to any other chip in performing classification. At the same time, the Akida 2 architecture was being designed and patented. As well as TENNs, this offered ViT in the top level configuration with lots of neurons/NPUs.
Meanwhile, back at the ranch, Rudy, Olivier and co, were developing TENNs. At first, TENNs proved its value by implementing analysis of the temporal element of input signals, (video, voice, ...).
The TENNs team proceeded to develop TENNs models. Then, perhaps by a stroke of serendipity, they found that TENNs had an affinity for MACs. Akida 1 had adopted a few MACs per node when it was upgraded to 4-bit, so maybe these were used to implement some TENNs functionality - I don't know. The outcome was that TENNs seems to have met every challenge, and, this is my personal heresy, but much to my technical fan boy disappointment, TENNs seems to have now totally displaced the brilliant original Akida 1 neurons/NPUs and Akida now runs on 128 MAC nodes using TENNs models.
In any case, the developers tested TENNs on a largeish bundle of MACs (possibly on an FPGA?) The result = duck/water.
Now it appears that all the Akida nodes include 128 MACS, but ViT is no longer offered as an option. So I assume that TENNs produces the result that ViT produced, only more efficiently.
This would leave us with a load of legacy Akida 1000/1500 chips built using the original NPUs which are supported by the original models. It is still possible to build new models for the original Akida architecture, presumably on a "by request" or do-it-yourself basis, but the model development now will be directed to TENNs. At present, some basic TENNs models are available for download, but the more sophisticated models are provided only on request.
Another thing that Tony Lewis (?) said was that they have used look-up tables (LUTs) to implement activation functions, a task which our competitors calculate in software.
So Akida architecture now includes 128 MAC nodes with LUTs.
Since MB is an early adopter and we have confirmation that they are still interested in Akida 2, MB will be fully familiar with TENNs.
My guess is that MB and other select EAPs have been playing with Akida 2 in FPGA format (6 nodes like the online version?) for some months now.
Tony also mentioned that we use a lot fewer MACs than the competition (ARM ETHOS, maybe Qualcomm Hexagon?).
We know that MB is using Qualcomm, and Hexagon uses MACs:
https://www.qualcomm.com/products/technology/processors/hexagon
T
he Hexagon NPU mimics the neural network layers and operations of popular models, such as activation functions, convolutions, fully-connected layers, and transformers, to deliver peak performance, power efficiency, and area efficiency crucial for executing the numerous multiplications, additions, and other operations in machine learning.
Distinguished by its system approach, custom design, and fast innovation, the Hexagon NPU stands out. The Hexagon NPU fuses together the scalar, vector, and tensor accelerators for better performance and power efficiency. A large, dedicated, shared memory allows these accelerators to share and move data efficiently. Our cutting-edge micro tile inferencing technology delivers ultra-low power consumption and sets a new benchmark in AI processing speed and efficiency..
US2020073636A1 MULTIPLY-ACCUMULATE (MAC) OPERATIONS FOR CONVOLUTIONAL NEURAL NETWORKS Priority: 20180831
[0056] FIG. 4 is a block diagram illustrating an exemplary software architecture 400 that may modularize artificial intelligence (AI) functions. Using the architecture, applications may be designed that may cause various processing blocks of an SOC 420 (for example a CPU 422 , a DSP 424 , a GPU 426 and/or an NPU 428 ) to support fast multiply-accumulate (MAC) computations during run-time operation of an AI application 402 , according to aspects of the present disclosure.
[0058]
A run-time engine 408 , which may be compiled code of a runtime framework, may be further accessible to the AI application 402 . The AI application 402 may cause the run-time engine, for example, to request an inference at a particular time interval or triggered by an event detected by the user interface of the application. When caused to provide an inference response, the run-time engine may in turn send a signal to an operating system in an operating system (OS) space 410 , such as a Linux Kernel 412 , running on the SOC 420 . The operating system, in turn, may cause a fast MAC computation to be performed on the CPU 422 , the DSP 424 , the GPU 426 , the NPU 428 , or some combination thereof. The CPU 422 may be accessed directly by the operating system, and other processing blocks may be accessed through a driver, such as a driver 414 , 416 , or 418 for, respectively, the DSP 424 , the GPU 426 , or the NPU 428 . In the exemplary example, the deep neural network may be configured to run on a combination of processing blocks, such as the CPU 422 , the DSP 424 , and the GPU 426 , or may be run on the NPU 428 .
Qualcomm suggest that the MAC computation may be performed on the CPU, GPU, DSP or NPU. With Akida, the NPU would be so far ahead there would be no choice.