Both Sally and the other presenter admitted they do not understand Akida. It looks like we need a media pack which explains Akida's spiking neuromorphic processing unit.
Akida originally used a single binary bit (1-bit) "spike".
This was field tested (FPGA?) with EAPs and was changed to an upper limit of 4-bit, with 1-bit and 2-bit options for lower power. This change meant that the spike could assume any value from 1 to 15, greatly increasing accuracy/sensitivity. These multi-bit options would have required an increase to 4 in the number of conductors in the SoC copper bus so the bits could be transmitted in parallel. Likewise, the processing would have become more complex, requiring the capability to handle 4-bit processes.
Akida 2 now has the 8-bit option. From the little I've been able to glean, I believe Akida 2 only uses the 8-bit option on the input stage, reverting to 4-bits for the internal layers.
It would be an hour's work for an Akida engineer to knock together a few simplified functional block diagrams illustrating the 1-bit, 4-bit, and 8-bit versions of Akida.
Just to add:
neuromorphic computing is a step beyond CPU/GPU.
SNN is a step beyond neuromorphic.
CPU/GPU are based on mathematical precision. Hence floating point numbers (usually 32 bits or more with exponential multipliers) which is what MACs (Multiply Accumulate Circuits) are very good at. A MAC multiplies one number (eg 32 bits) by another 32 bit number - that's about 1000 operations, and then adds (accumulates) the result with previous results. For an image, this process is carried out for each pixel in an image in a series of comparisons. This amounts to billions of operations (transistor switching on/off which takes time and draws electrical power).
The brain does not use mathematical precision to interpret the signals from the 5 senses. Instead it compares the signals from its sensors with memories of previous experiences/previously experienced sensor signals.
But this comparison does not use mathematical precision. Instead, it uses the nearest approximation - "this smells more like toast than cake".
Using images as an example, a SNN takes small samples of the input image and compares them with small samples from the model library of images. from PvdM's 4 Bis Are Enough paper, we can infer that the output of each pixel can be adequately represented with sufficient accuracy by 4 bits.
An SNN does not produce mathematical certainty. It produces the highest probability.
Hi
@Diogenese
while I don’t have the background to spar with you on technical matters, I noticed that you posted links to three different patents all filed back in 2018 (!) by a couple of researchers based at the imec headquarters in Leuven, Belgium
, whereas the team currently working on the RISC-V-based SENeCA digital neuromorphic processor that
@Bravo referred to is actually from imec The Netherlands
in Eindhoven, which also happens to be the birth place of Philips (hence it is a natural fit for them to collaborate, I’d say - European localism should not be underestimated).
Ella and I both reckon that just because the researchers named in those older patents you linked and those working on SENeCA as I write are all imec employees, it doesn’t mean those independent teams must have a similar approach?
Or put differently: Unlike Ella and Louis, they may not necessarily be dancing Cheek to Cheek… 🎙
Cheers,
Frangipani
View attachment 50742
View attachment 50742
View attachment 50744
View attachment 50746
View attachment 50747
Hi Frangipani,
One again you've tripped me up with your geographic prestidigitations.
And then you throw this ensemble at me:
https://en.wikipedia.org/wiki/Ella_and_Louis#The_album - just look at that band! Oscar Peterson, Herb Ellis, Ray Brown, Buddy Rich!
Peterson's Easy Listenin Blues is the epitome of the smoky 3 AM jazz dive (leaving aside Hawkins Smoke gets in Your Eyes).
Herb Ellis is still on the town, but Ray Brown is reportedly behavin' himself, and Buddy Rich beat it.
Yes, those pesky Belgian IMECers do have a digital CNN accelerator patent:
EP3674982A1 HARDWARE ACCELERATOR ARCHITECTURE FOR CONVOLUTIONAL NEURAL NETWORK 20181227
A
hardware accelerator architecture (10) for a convolutional neural network comprises a first memory (11) for storing NxM activation inputs of an input tensor; a plurality of processor units (12) each comprising a plurality of Multiply ACcumulate (MAC) arrays (13) and a filter weights memory (14) associated with and common to the plurality of MAC arrays of one processor unit (12). Each MAC array is adapted for receiving a predetermined fraction (FxF) of the NxM activation inputs from the first memory, and filter weights from the associated filter weights memory (14). Each MAC array is adapted for subsequently, during different cycles, computing and storing different partial sums, while reusing the received filter weights, such that every MAC array computes multiple parts of columns of an output tensor, multiplexed in time. Each MAC array further comprises a plurality of accumulators (18) for making a plurality of full sums from the partial sums made at subsequent cycles.
... but it has the Scottish disease.
SENECA unfortunately suffers from
hysteron proteron (ACENES) - it uses the processor system to run the NPEs:
3 ... I
n general, for a normal incoming spike, RISC-V performs a pre-processing phase to retrieve the relevant local information required to process the spikes (for example, the address of the corresponding parameters in the Data Memory) and packs that information in the form of a micro-task. Then this micro-task is pushed to the Task FIFO. The loop controller executes the tasks one by one based on the micro-code instructions stored in the loop buffer. The loop controller is a small dedicated controller programmed to execute a sequence of instructions in parallel through the NPEs (Neural Processing Elements). Some neural operations in NPEs may result in output spikes which will be converted to packets of data inside the event generator. The event generator unit interrupts the RISC-V to perform postprocessing on the generated events. RISC-V can feed the generated events back into the Task FIFO or send them out through the NoC. Following, we will explain each element of SENECA in more detail.
I don't know how Akida will interact with RISC-V, but it will not tolerate interruptions from the system.
3.2. N
euron processing elements (NPEs)
The SENECA core includes an array of neuron processing elements (NPEs) that act as physical neurons in Figure 1C. Each NPE contains a small register-based memory and executes a category of instructions. An array of NPEs is forming a SIMD (Single Instruction Multiple Data) type architecture (Flynn, 1972). Instructions to be executed in NPEs are coming from the Loop Buffer. NPEs can get their data from Data Memory (through a wide Data Memory port), RISC-V (by directly writing into their register file), and Loop controller (broadcasting).
The register file inside the NPEs allows for reusing data as much as possible before reading/writing it into the Data Memory. Table 2 shows that accessing the data in NPEs' register file is about 20 × more energy efficient than accessing the Data in the Data Memory (SRAM). For example, in an extreme case where the number of neurons is low6, keeping the neuron states inside the NPEs and only reading the weights from Data Memory (avoiding the neuron state read/write) reduces the energy consumption of a synaptic operation from 2.8 to 1.8 pJ7.
In neuromorphic applications, the optimized resolution of neuron states and synaptic weights depends on several variables (Khoram and Li, 2018). Therefore, to optimize the memory footprint and access energy, it is crucial that our NPEs support various data types and precision. Currently, NPEs are designed to support 4, 8, and 16 bit data precisions, both for linear and logarithmic quantization (floating point). They also support shared scale factors (Köster et al., 2017; Moons et al., 2017; Jacob et al., 2018; Kalamkar et al., 2019; Coelho et al., 2021). This flexibility allows for the memory-efficient deployment of mixed precision neural networks for inference and on-device adaptation. Each NPE consumes 1.3% of the total area of the core.
3.3
...
A
s mentioned, NPEs do not implement a specific neuron model. They only execute special operations, which are common among many neuron models. A neuron/synapse/learning model can be built by sequential execution of a few instructions, called microcode. The loop controller sends the microcode to the NPEs in a “for-loop” style to process events. Therefore, the Loop controller is optimized to execute nested loops. Executing loops using the loop controller is 100 × more energy efficient compared to the RISC-V.
SENECA is not a digital SNN SoC. It uses special programmable processors to perform SNN functions under software control.
It's not really surprising that they did not compare SENECA with Akida.