When comparing chalk and cheese, are we talking camembert or parmesan?
Perceive's figures are for the YOLO V5 database.
In 2019, the Akida simulator did 30 fps @ 157 mW on MobileNet V1.
View attachment 27636
I don't have the fps/W figures for the Akida 1 SoC performance, so if anyone has those to hand, it would be much appreciated.
The chip simulator seems to have topped out at 80 fps. We know Akida 1 SoC can top 1600 fps (don't recall what database), which is 20 times faster than the simulator, so it wouldn't even get into second gear doing 30 fps.
Back in 2018, Bob Beachler said they expected 1400 frames per second per Watt.
https://www.eejournal.com/article/brainchip-debuts-neuromorphic-chip/
So 30 fps would be 20 mW if there is a linear dependence. And, of course, we were told that the SoC performed better than expected.
But the main thing is that the comparison databases need to be the same as performance varies depending on database.
Perceive relys on compression to achieve extreme sparsity by ignoring the zeros in the multiplier.
US11003736B2 Reduced dot product computation circuit
View attachment 27638
[0003] S
ome embodiments provide an integrated circuit (IC) for implementing a machine-trained network (e.g., a neural network) that computes dot products of input values and corresponding weight values (among other operations). The IC of some embodiments includes a neural network computation fabric with numerous dot product computation circuits in order to process partial dot products in parallel (e.g., for computing the output of a node of the machine-trained network). In some embodiments, the weight values for each layer of the network are ternary values (e.g., each weight is either zero, a positive value, or the negation of the positive value), with at least a fixed percentage (e.g., 75%) of the weight values being zero. As such, some embodiments reduce the size of the dot product computation circuits by mapping each of a first number (e.g., 144) input values to a second number (e.g., 36) of dot product inputs, such that each dot product input only receives at most one input value with a non-zero corresponding weight value.
1.
A method for implementing a machine-trained network that comprises a plurality of processing nodes, the method comprising:
at a particular dot product circuit performing a dot product computation for a particular node of the machine-trained network:
receiving (i) a first plurality of input values that are output values of a set of previous nodes of the machine-trained network and (ii) a set of machine-trained weight values associated with a set of the input values;
selecting, from the first plurality, a second plurality of input values that is a smaller subset of the first plurality of input values, said selecting comprising (i) selecting the input values from the first plurality of input values that are associated with non-zero weight values, and (ii) not selecting a group of input values from the first plurality of input values that are associated with weight values that are equal to zero;
computing a dot product based on (i) the second plurality of input values and (ii) weight values associated with the second plurality of input values.
ASIDE: This isn't relevant to the discussion, but I stumbled across it just now and it shows that we developed a dataset for MagikEye in 2020, which I had not observed before.
View attachment 27637