Hi SC,
MegaChips have some tie-up with Quadric and they are persisting with their tech which uses MACs for motion estimation, image detection and manipulation.
MACs are conventionally used for processing mathematic calculations with high precision. They dot every i and cross every t.
In contrast, Akida is the "big picture" player, taking an overview and making a decision on probability. This makes Akida's SNN far more energy efficient and reduces latency.
Now we are told that TeNNs is more efficient than Akida. I have not come to grips with the in's and out's of TeNNs, but, for those wishing to get a better understanding of the background context underlying the impetus to develop TeNNs, the "BACKGROUND" section of the TeNNs patent provides a readily comprehensible explanation of the motivation behind the invention.
WO2023250093A1 METHOD AND SYSTEM FOR IMPLEMENTING TEMPORAL CONVOLUTION IN SPATIOTEMPORAL NEURAL NETWORKS 20220622
[0003]
In general, ANNs were initially developed to replicate the behavior of neurons which communicate with each other via electrical signals known as "spikes". The information conveyed by the neurons was initially believed to be mainly encoded in the rate at which the neurons emit these spikes. Initially, nonlinearities in ANNs, such as sigmoid functions, were inspired by the saturating behavior of neurons. Neurons' firing activity reaches saturation as the neurons approach their maximum firing rate, and nonlinear functions, such as, sigmoid functions were used to replicate this behavior in ANNs. These nonlinear functions became activation functions and allowed ANNs to model complex nonlinear relationships between neuron inputs and outputs.
[0005] Currently, most of the accessible data is available in spatiotemporal formats. To use the spatiotemporal forms of data effectively in machine learning applications, it is essential to design a lightweight network that can efficiently learn spatial and temporal features and correlations from data. At present, the convolutional neural network (CNN) is considered the prevailing standard for spatial networks, while the recurrent neural network (RNN) equipped with nonlinear gating mechanisms, such as long short-term memory (LSTM) and gated recurrent unit (GRU), is being preferred for temporal networks.
[0006] The CNNs are capable of learning crucial spatial correlations or features in spatial data, such as images or video frames, and gradually abstracting the learned spatial correlations or features into more complex features as the spatial data is processed layer by layer. These CNNs have become the predominant choice for image classification and related tasks over the past decade. This is primarily due to the efficiency in extracting spatial correlations from static input images and mapping them into their appropriate classifications with the fundamental engines of deep learning like gradient descent and backpropagation paring up together. This results in state-of-the-art accuracy for the CNNs. However, many modem Machine Learning (ML) workflows increasingly utilize data that come in spatiotemporal forms, such as natural language processing (NLP) and object detection from video streams. The CNN models used for image classification lack the power to effectively use temporal data present in these application inputs. Importantly, CNNs fail to provide flexibility to encode and process temporal data efficiently. Thus, there is a need to provide flexibility to artificial neurons to encode and process temporal data efficiently.
[0007] Recently different methods to incorporate temporal or sequential data, including temporal convolution and internal state approaches have been explored. When temporal processing is a requirement, for example in NLP or sequence prediction problems, the RNNs such as long short-term memory (LSTM) and gated recurrent memory (GRU) models are utilized. Further, according to another conventional method, a 2D spatial convolution combined with state-based RNNs such as LSTMs or GRUs to process temporal information components using models such as ConvLSTM have been used. However, each of these conventional approaches comes with significant drawbacks. For example, while combining 2D spatial convolutions with ID temporal convolutions requires large amount of parameters due to temporal dimension and is thus not appropriate for efficient low-power inference.
[0008] One of the main challenges with the RNNs is the involvement of excessive nonlinear operations at each time step, that leads to two significant drawbacks. Firstly, these nonlinearities force the network to be sequential in time i.e., making the RNNs difficult for efficiently leveraging parallel processing during training. Secondly, since the applied nonlinearities are ad-hoc in nature and lack a theoretical guarantee of stability, it is challenging to train the RNNs or perform inference over long sequences of time series data. These limitations also apply to models, for example, ConvLSTM models as discussed in the above paragraphs, that combine 2D spatial convolution with RNNs to process the sequential and temporal data.
[0009] In addition, for each of the above discussed NN models including ANN, CNN, and RNN, the computation process is very often performed in the cloud. However, in order to have a better user experience, privacy, and for various commercial reasons, an implementation of the computation process has started moving from the cloud to edge devices. Various applications like video surveillance, self-driving video, medical vital signs, speech/audio related data are implemented in the edge devices. Further, with the increasing complexity of the NN models, there is a corresponding increase in the computational requirements required to execute highly complex NN Models. Thus, a huge computational processing and a large memory are required for executing highly complex NN Models like CNNs and RNNs in the edge devices. Further, the edge devices are often required to focus on receiving a continuous stream of the same data from a particular application, as discussed above. This necessitates a large memory buffer (time window) of past inputs to perform temporal convolutions at every time step. However, maintaining such a large memory buffer can be very expensive and power-consuming.