Hi Doz,
This is an interesting PCT application which includes the N-of-M coding. It resulted in the grant of this US patent:
US11227210B2 Event-based classification of features in a reconfigurable and temporally coded convolutional spiking neural network
It has a priority of 20190725, which is too early to for transformers.
But there was a subsequent "continuation-in-part" [patent of addition] application filed in the US:
US2022147797A1 EVENT-BASED EXTRACTION OF FEATURES IN A CONVOLUTIONAL SPIKING NEURAL NETWORK
which has a priority of 20220114, and is awaiting examination by USPTO.
This application does refer to transformation modules, and the description of the transformer module was added to the parent description on 20220114.
[006] ... However, up to now temporal spiking neural networks have not been able to meet the accuracy demands of image classification. Spiking neural networks comprise a network of threshold units, and spike inputs connected to weights that are additively integrated to create a value that is compared to one or more thresholds. No multiplication functions are used. Previous attempts to use spiking neural networks in classification tasks have failed because of erroneous assumptions and subsequent inefficient spike rate approximation of conventional convolutional neural networks and architectures. In spike rate coding methods, the values that are transmitted between neurons in a conventional convolutional neural network are instead approximated as spike trains, whereby the number of spikes represent a floating-point or integer value which means that no accuracy gains or sparsity benefits may be expected. Such rate-coded systems are also significantly slower than temporal-coded systems, since it takes time to process sufficient spikes to transmit a number in a rate-coded system. The present invention avoids those mistakes and returns excellent results on complex data sets and frame-based images.
A system, comprising:
a memory for storing data representative of at least one kernel;
a plurality of spiking neuron circuits;
an input module for receiving spikes related to digital data, wherein each spike is relevant to a spiking neuron circuit and each spike has an associated spatial coordinate corresponding to a location in an input spike array;
a transformation module configured to:
transform a kernel to produce a transformed kernel having an increased resolution relative to the kernel; and/or
transform the input spike array to produce a transformed input spike array having an increased resolution relative to the input spike array;
a packet collection module configured to collect spikes until a predetermined number of spikes relevant to the input spike array have been collected in a packet in memory, and to organize the collected relevant spikes in the packet based on the spatial coordinates of the spikes; and
a convolutional neural processor configured to perform event-based convolution using memory and at least one of the transformed input spike array and the transformed kernel.
However, this use of the term "transformation" is not the same as the "transformer" which is supplanting LSTM with its "attention" capability.
https://blogs.nvidia.com/blog/2022/...ideo,can be used to create even better models.
Transformers use positional encoders to tag data elements coming in and out of the network. Attention units follow these tags, calculating a kind of algebraic map of how each element relates to the others.
Attention queries are typically executed in parallel by calculating a matrix of equations in what’s called multi-headed attention.
With these tools, computers can see the same patterns humans see.