Hi Bravo,
The ARM U85 uses MACs:
https://www.bing.com/images/search?view=detailV2&ccid=SKxR9R6A&id=FDE75A522354D643A048FDBF7BD709D6B16C62E3&thid=OIP.SKxR9R6AvaWns2DkvNwBzQHaFe&mediaurl=https://lh7-us.googleusercontent.com/docsz/AD_4nXdCJtxPcYCTi6bGU47CnzGMdXJ4kW5j5u1EkQcbitxexcMcuzHV3dYpoAoeBa4ITge7ZLR5CMVZV3Po3TZIG-e1Tnp_GIEbjzMjfcOoyz1lp01fxeqKqgomdCzj_PdIARM4JdK80VX56Ea9PNOYaYtdAFM?key=25AlXtfNVs_jGBLuOXoleg&exph=671&expw=907&q=arm+ethos+u85+block+diagram&form=IRPRST&ck=AEB858B96E0340034576C1DFDA68FBAC&selectedindex=1&itb=0&ajaxhist=0&ajaxserp=0&vt=0&sim=11
View attachment 89599
Arm® Ethos™-U85 NPU Technical Overview
View attachment 89600
The weight and fast weight channels transfer compressed weights from external memory to the weight decoder. The DMA controller uses a read buffer to hide bus latency from the weight decoder and to enable the DMA to handle data arriving out of order. The traversal unit triggers these channels for blocks that require the transfer of weights.
The weight stream must be quantized to eight bits or less by an offline tool. When passed through the offline compiler, weights are compressed losslessly and reordered into an NPU specific weight stream. This process is effective, if the quantizer uses less than eight bits or if it uses clustering and pruning techniques. The quantizer can also employ all three methods. Using lossless compression on high sparsity weights, containing greater than 75% zeros can lead to compression below 3 bits per weight in the final weight stream
Given Akida 3's int16/FP32 capabilities, Akida 3 will be able to be used in more high precision applications than Ethos U85.
I don't see any advantage in running Akida 3 with Ethos U85.
It is open to anyone to license the ARM Cortex M85 CPU IP and combine it with Akida 2 or 3 IP in a SoC. Both Akidae include the TENNs model, and would have significant power and latency advantages over Ethos U85.