Hi Wags,
I had thought that the special sauce was the N-of-M coding which greatly enhances sparsity without affecting accuracy. This was developed by Simon Thorpe's group, but was not patented. It was licensed and then sold to Brainchip. It was also developed independently by Steve Furber (SpiNNaker, Man Uni). If I recall correctly, Applied Brain Research (ABR - Chris Eliasmith) appears to use this (as well as state space models).
https://www.appliedbrainresearch.com/technology
I haven't seen a one-to-one comparison with ABR, but a quick scan of the most recently published ABR NN processor patent application suggests to me that it is very complex and would probably be slower than TENNs
WO2024197396A1 EVENT-BASED NEURAL NETWORK PROCESSING SYSTEM 20230326
View attachment 94377
[0108]
Returning to the address generator 335 in the processing element 120, the generator 335 comprises two major components: the loop parameter generator, and the loop iterator. The loop iterator is relatively simple to implement: in operation, it receives loop parameters through a stream interface, validates them, and implements two nested for loops using a simple state machine. The loop parameter generator, on the other hand, is more complex and temporally multiplexed. An arithmetic logic unit (ALU) is used to perform the individual operations described above and is depicted in FIG. 15. The flow of data is controlled using a micro-program, as mentioned above. A number of possible input sources are provided at the top of the diagram, including includes the event source x- and y-location, the address generator configuration registers (abbreviated as "config”), and the fed-back loop-parameter output. The computation performed by the ALU is controlled through seven multiplexers/ demultiplexers, as well as a write-enable control signal and a logic-unit operation signal (write-enable and logic-unit signals are not shown in FIG. 15). The input multiplexer "muxjn” selects either a configuration word or the source x- and y-location. The output read multiplexer "mux_out_rd” selects one of the loop-parameter values as an input; the output write demultiplexer "mux_out_wr” determines which loop-parameter should be updated. Lastly, the multiplexers mux.a, mux_b, mux.c, mux_d what input should be routed to the arithmetic and logic units, as well as the output. The logic unit (LU) performs operations such as logical "AND” as well as right-shift operations with a latency of one cycle. The logic unit (LU) performs operations such as logical "AND” as well as right-shift operations with a latency of one cycle. The multiply-accumulate-unit (MAC) performs the fixed computation a x b + c with a latency of three cycles. The short horizontal bars following the multiplexers, LU, and MAC represent register boundaries (i.e., one clock cycle passes between the top- and bottom of the black bar).
So, while TENNs may trump (pardon the expression) N-of-M, we need to maintain constant vigilance.