I have it on good authority from someone who lives in a barrel that for the following patent to be actioned they would need to already have access to a convolutional spiking neural network processor.
Now some might completely discount the fact that Arijit Mukherjee from the above article who is one of the inventors of the following patent and who was a member of the Brainchip Tata team that presented a joint demonstration on 14.12.19 of AKIDA technology performing live gesture recognition and that Brainchip having the only commercially available patent protected convolutional spiking neural network chip in the world 3 years ahead of anyone else as proving or even pointing to Brainchip as providing this chip to Tata but I am not in that camp.
This is one huge statement for TATA to make in my opinion: "Neuromorphic Computing Brings AI to the Edge How conventional processor architecture is becoming a thing of the past".
My opinion only DYOR
FF
AKIDA BALLISTA
This disclosure relates to method of identifying a gesture from a plurality of gestures using a reservoir based convolutional spiking neural network. A two-dimensional spike streams is received from neuromorphic event camera as an input. The two-dimensional spike streams associated with at least...
patents.justia.com
System and method of gesture recognition using a reservoir based convolutional spiking neural network
Dec 17, 2020
This disclosure relates to method of identifying a gesture from a plurality of gestures using a reservoir based convolutional spiking neural network. A two-dimensional spike streams is received from neuromorphic event camera as an input. The two-dimensional spike streams associated with at least one gestures from a plurality of gestures is preprocessed to obtain plurality of spike frames. The plurality of spike frames is processed by a multi layered convolutional spiking neural network to learn plurality of spatial features from the at least one gesture. A filter block is deactivated from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt. A spatio-temporal features is obtained by allowing the spike activations from CSNN layer to flow through the reservoir. The spatial feature is classified by classifier from the CSNN layer and the spatio-temporal features from the reservoir to obtain set of prioritized gestures.
Skip to:
Description ·
Claims ·
References Cited ·
Patent History ·
Patent History
Description
PRIORITY CLAIM
This U.S. patent application claims priority under 35 U.S.C. § 119 to: India Application No. 202021025784, filed on Jun. 18, 2020. The entire contents of the aforementioned application are incorporated herein by reference.
TECHNICAL FIELD
This disclosure relates generally to gesture recognition, and, more particularly, to system and method of gesture recognition using a reservoir based convolutional spiking neural network.
BACKGROUND
In an age of artificial intelligence, robots and drones are key enablers of task automation and they are being used in various domains such as manufacturing, healthcare, warehouses, disaster management etc. As a consequence, they often need to share work-space with and interact with human workers and thus evolving the area of research named Human Robot Interaction (HRI). Problems in this domain are mainly centered around learning and identifying of gestures/speech/intention of human coworkers along with classical problems of learning and identification of surrounding environment (and obstacles, objects etc. therein). All these essentially are needed to be done in a dynamic and noisy practical work environment. As of current state of the art vision based solutions using artificial neural networks (including deep neural networks) have high accuracy, however the models are not the most efficient solutions as learning methods and inference frameworks of the conventional deep neural networks require huge amount of training data and are typically compute and energy intensive. They are also bounded by one or more conventional architectures that leads to data transfer bottleneck between memory and processing units and related power consumption issues. Hence, this genre of solutions does not really help robots and drones to do their jobs as they are classically constrained by their battery life.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, a processor implemented method of identifying a gesture from a plurality of gestures using a reservoir based convolutional spiking neural network is provided. The processor implemented method includes at least one of: receiving, from a neuromorphic event camera, two-dimensional spike streams as an input; preprocessing, via one or more hardware processors, the address event representation (AER) record associated with at least one gestures from a plurality of gestures to obtain a plurality of spike frames; processing, by a multi layered convolutional spiking neural network, the plurality of spike frames to learn a plurality of spatial features from the at least one gesture; deactivating, via the one or more hardware processors, at least one filter block from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt; obtaining, via the one or more hardware processors, spatio-temporal features by allowing the spike activations from a CSNN layer to flow through the reservoir; and classifying, by a classifier, the at least one of spatial feature from the CSNN layer and the spatio-temporal features from the reservoir to obtain a set of prioritized gestures. In an embodiment, the two-dimensional spike streams are represented as an address event representation (AER) record. In an embodiment, each sliding convolutional window in the plurality of spike frames are connected to a neuron corresponding to a filter among plurality of filters corresponding to a filter block among plurality of filter blocks in each convolutional layer from plurality of convolutional layers. In an embodiment, the plurality of filter blocks are configured to concentrate a plurality of class-wise spatial features to the filter block for learning associated patterns based on a long-term lateral inhibition mechanism. In an embodiment, the CSNN layer is stacked to provide at least one of: (i) a low-level spatial features, (ii) a high-level spatial features, or combination thereof.
In an embodiment, the spike streams may be compressed per neuronal level by accumulating spikes at a sliding window of time, to obtain a plurality of output frames with reduced time granularity. In an embodiment, plurality of learned different spatially co-located features may be distributed on the plurality of filters from the plurality of filter blocks. In an embodiment, a special node between filters of the filter block may be configured to switch between different filters based on an associated decay constant to distribute learning of different spatially co-located features on the different filters. In an embodiment, a plurality of weights of a synapse between input and the CSNN layer may be learned using an unsupervised two trace STDP learning rule upon at least one spiking activity of the input layer. In an embodiment, the reservoir may include a sparse random cyclic connectivity which acts as a random projection of the input spikes to an expanded spatio-temporal embedding.
In another aspect, there is provided a system to identify a gesture from a plurality of gestures using a reservoir based convolutional spiking neural network. The system comprises a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces. The one or more hardware processors are configured by the instructions to: receive, from a neuromorphic event camera, two-dimensional spike streams as an input; preprocess, the address event representation (AER) record associated with at least one gestures from a plurality of gestures to obtain a plurality of spike frames; process, by a multi layered convolutional spiking neural network, the plurality of spike frames to learn a plurality of spatial features from the at least one gesture; deactivate, at least one filter block from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt; obtain, spatiotemporal features by allowing the spike activations from a CSNN layer to flow through the reservoir; and classify, by a classifier, the at least one of spatial feature from the CSNN layer and the spatiotemporal features from the reservoir to obtain a set of prioritized gestures. In an embodiment, the two-dimensional spike streams is represented as an address event representation (AER) record. In an embodiment, each sliding convolutional window in the plurality of spike frames are connected to a neuron corresponding to a filter among plurality of filters corresponding to a filter block among plurality of filter blocks in each convolutional layer from plurality of convolutional layers. In an embodiment, the plurality of filter blocks are configured to concentrate a plurality of class-wise spatial features to the filter block for learning associated patterns based on a long-term lateral inhibition mechanism. In an embodiment, the CSNN layer is stacked to provide at least one of: (i) a low-level spatial features, (ii) a high-level spatial features, or combination thereof.
In an embodiment, the spike streams may be compressed per neuronal level by accumulating spikes at a sliding window of time, to obtain a plurality of output frames with reduced time granularity. In an embodiment, plurality of learned different spatially co-located features may be distributed on the plurality of filters from the plurality of filter blocks. In an embodiment, a special node between filters of the filter block may be configured to switch between different filters based on an associated decay constant to distribute learning of different spatially co-located features on the different filters. In an embodiment, a plurality of weights of a synapse between input and the CSNN layer may be learned using an unsupervised two trace STDP learning rule upon at least one spiking activity of the input layer. In an embodiment, the reservoir may include a sparse random cyclic connectivity which acts as a random projection of the input spikes to an expanded spatio-temporal embedding.
In yet another aspect, there are provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes at least one of: receiving, from a neuromorphic event camera, two-dimensional spike streams as an input; preprocessing, the address event representation (AER) record associated with at least one gestures from a plurality of gestures to obtain a plurality of spike frames; processing, by a multi layered convolutional spiking neural network, the plurality of spike frames to learn a plurality of spatial features from the at least one gesture; deactivating, at least one filter block from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt; obtaining, spatio-temporal features by allowing the spike activations from a CSNN layer to flow through the reservoir; and classifying, by a classifier, the at least one of spatial feature from the CSNN layer and the spatio-temporal features from the reservoir to obtain a set of prioritized gestures. In an embodiment, the two-dimensional spike streams are represented as an address event representation (AER) record. In an embodiment, each sliding convolutional window in the plurality of spike frames are connected to a neuron corresponding to a filter among plurality of filters corresponding to a filter block among plurality of filter blocks in each convolutional layer from plurality of convolutional layers. In an embodiment, the plurality of filter blocks are configured to concentrate a plurality of class-wise spatial features to the filter block for learning associated patterns based on a long-term lateral inhibition mechanism. In an embodiment, the CSNN layer is stacked to provide at least one of: (i) a low-level spatial features, (ii) a high-level spatial features, or combination thereof.
In an embodiment, the spike streams may be compressed per neuronal level by accumulating spikes at a sliding window of time, to obtain a plurality of output frames with reduced time granularity. In an embodiment, plurality of learned different spatially co-located features may be distributed on the plurality of filters from the plurality of filter blocks. In an embodiment, a special node between filters of the filter block may be configured to switch between different filters based on an associated decay constant to distribute learning of different spatially co-located features on the different filters. In an embodiment, a plurality of weights of a synapse between input and the CSNN layer may be learned using an unsupervised two trace STDP learning rule upon at least one spiking activity of the input layer. In an embodiment, the reservoir may include a sparse random cyclic connectivity which acts as a random projection of the input spikes to an expanded spatio-temporal embedding.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed....
Claims
1. A processor implemented method of identifying a gesture from a plurality of gestures using a reservoir based convolutional spiking neural network, comprising:
receiving, from a neuromorphic event camera, two-dimensional spike streams as an input, wherein the two-dimensional spike streams are represented as an address event representation (AER) record;preprocessing, via one or more hardware processors, the address event representation (AER) record associated with at least one gestures from a plurality of gestures to obtain a plurality of spike frames;processing, by a multi layered convolutional spiking neural network, the plurality of spike frames to learn a plurality of spatial features from the at least one gesture, wherein each sliding convolutional window in the plurality of spike frames are connected to a neuron corresponding to a filter among plurality of filters corresponding to a filter block among plurality of filter blocks in each convolutional layer from plurality of convolutional layers;deactivating, via the one or more hardware processors, at least one filter block from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt, wherein the plurality of filter blocks are configured to concentrate a plurality of class-wise spatial features to the filter block for learning associated patterns based on a long-term lateral inhibition mechanism;obtaining, via the one or more hardware processors, spatio-temporal features by allowing the spike activations from a CSNN layer to flow through the reservoir, wherein the CSNN layer is stacked to provide at least one of: (i) a low-level spatial features, (ii) a high-level spatial features, or combination thereof; andclassifying, by a classifier, the at least one of spatial feature from the CSNN layer and the spatio-temporal features from the reservoir to obtain a set of prioritized gestures.
2. The processor implemented method of claim 1, wherein the spike streams are compressed per neuronal level by accumulating spikes at a sliding window of time, to obtain a plurality of output frames with reduced time granularity.
3. The processor implemented method of claim 1, wherein a plurality of learned different spatially co-located features are distributed on the plurality of filters from the plurality of filter blocks.
4. The processor implemented method of claim 1, wherein a special node between filters of the filter block is configured to switch between different filters based on an associated decay constant to distribute learning of different spatially co-located features on the different filters.
5. The processor implemented method of claim 1, wherein a plurality of weights of a synapse between input and the CSNN layer are learned using an unsupervised two trace STDP learning rule upon at least one spiking activity of the input layer.
6. The processor implemented method of claim 1, wherein the reservoir comprises a sparse random cyclic connectivity which acts as a random projection of the input spikes to an expanded spatio-temporal embedding.
7. A system (100) to identify a gesture from a plurality of gestures using a reservoir based convolutional spiking neural network, comprising:
a memory (102) storing instructions;one or more communication interfaces (106); andone or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to: receive, from a neuromorphic event camera, two-dimensional spike streams as an input, wherein the two-dimensional spike streams are represented as an address event representation (AER) record; preprocess, the address event representation (AER) record associated with at least one gestures from a plurality of gestures to obtain a plurality of spike frames; process, by a multi layered convolutional spiking neural network, the plurality of spike frames to learn a plurality of spatial features from the at least one gesture, wherein each sliding convolutional window in the plurality of spike frames are connected to a neuron corresponding to a filter among plurality of filters corresponding to a filter block among plurality of filter blocks in each convolutional layer from plurality of convolutional layers; deactivate, at least one filter block from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt, wherein the plurality of filter blocks are configured to concentrate a plurality of class-wise spatial features to the filter block for learning associated patterns based on a long-term lateral inhibition mechanism; obtain, spatiotemporal features by allowing the spike activations from a CSNN layer to flow through the reservoir, wherein the CSNN layer is stacked to provide at least one of: (i) a low-level spatial features, (ii) a high-level spatial features, or combination thereof; and classify, by a classifier, the at least one of spatial feature from the CSNN layer and the spatiotemporal features from the reservoir to obtain a set of prioritized gestures.
8. The system (100) of claim 7, wherein the spike streams are compressed per neuronal level by accumulating spikes at a sliding window of time, to obtain a plurality of output frames with reduced time granularity.
9. The system (100) of claim 7, wherein plurality of learned different spatially co-located features are distributed on the plurality of filters from the plurality of filter blocks.
10. The system (100) of claim 7, wherein a special node between filters of the filter block is configured to switch between different filters based on an associated decay constant to distribute learning of different spatially co-located features on the different filters.
11. The system (100) of claim 7, wherein a plurality of weights of a synapse between input and the CSNN layer are learned using an unsupervised two trace STDP learning rule upon at least one spiking activity of the input layer.
12. The system (100) of claim 7, wherein the reservoir comprises a sparse random cyclic connectivity which acts as a random projection of the input spikes to an expanded spatio-temporal embedding.
13. One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors perform actions comprising:
receiving, from a neuromorphic event camera, two-dimensional spike streams as an input, wherein the two-dimensional spike streams are represented as an address event representation (AER) record;preprocessing, the address event representation (AER) record associated with at least one gestures from a plurality of gestures to obtain a plurality of spike frames;processing, by a multi layered convolutional spiking neural network, the plurality of spike frames to learn a plurality of spatial features from the at least one gesture, wherein each sliding convolutional window in the plurality of spike frames are connected to a neuron corresponding to a filter among plurality of filters corresponding to a filter block among plurality of filter blocks in each convolutional layer from plurality of convolutional layers;deactivating, at least one filter block from the plurality of filter blocks corresponds to at least one gesture which are not currently being learnt, wherein the plurality of filter blocks are configured to concentrate a plurality of class-wise spatial features to the filter block for learning associated patterns based on a long-term lateral inhibition mechanism;obtaining, spatio-temporal features by allowing the spike activations from a CSNN layer to flow through the reservoir, wherein the CSNN layer is stacked to provide at least one of: (i) a low-level spatial features, (ii) a high-level spatial features, or combination thereof; andclassifying, by a classifier, the at least one of spatial feature from the CSNN layer and the spatio-temporal features from the reservoir to obtain a set of prioritized gestures.
14. The one or more non-transitory machine-readable information storage mediums of claim 13, wherein the spike streams are compressed per neuronal level by accumulating spikes at a sliding window of time, to obtain a plurality of output frames with reduced time granularity.
15. The one or more non-transitory machine-readable information storage mediums of claim 13, wherein a plurality of learned different spatially co-located features are distributed on the plurality of filters from the plurality of filter blocks.
16. The one or more non-transitory machine-readable information storage mediums of claim 13, wherein a special node between filters of the filter block is configured to switch between different filters based on an associated decay constant to distribute learning of different spatially co-located features on the different filters.
17. The one or more non-transitory machine-readable information storage mediums of claim 13, wherein a plurality of weights of a synapse between input and the CSNN layer are learned using an unsupervised two trace STDP learning rule upon at least one spiking activity of the input layer.
18. The one or more non-transitory machine-readable information storage mediums of claim 13, wherein the reservoir comprises a sparse random cyclic connectivity which acts as a random projection of the input spikes to an expanded spatio-temporal embedding.
Referenced Cited
U.S. Patent Documents
6028626 | February 22, 2000 | Aviv |
6236736 | May 22, 2001 | Crabtree |
6701016 | March 2, 2004 | Jojic |
7152051 | December 19, 2006 | Commons |
7280697 | October 9, 2007 | Perona |
8504361 | August 6, 2013 | Collobert |
8811726 | August 19, 2014 | Belhumeur |
8942466 | January 27, 2015 | Petre et al. |
9015093 | April 21, 2015 | Commons |
9299022 | March 29, 2016 | Buibas et al. |
Foreign Patent Documents
109144260 | January 2019 | CN |
WO2019074532 | April 2019 | WO |
Other references
- Panda, Priyadarshini et al., “Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model,” Frontiers in Neuroscience, Oct. 2017, Publisher: Arxiv Link: https://arxiv.org/pdf/1710.07354.pdf.
Patent History
Patent number: 11256954
Type: Grant
Filed: Dec 17, 2020
Date of Patent: Feb 22, 2022
Patent Publication Number:
20210397878
Assignee:
Tala Consultancy Services Limited (Mumbai)
Inventors:
Arun George (Bangalore),
Dighanchal Banerjee (Kolkata),
Sounak Dey (Kolkata),
Arijit Mukherjee (Kolkata)
Primary Examiner:
Yosef Kassa
Application Number: 17/124,584
Classifications
Current U.S. Class:
Intrusion Detection (348/152)
International Classification: G06K 9/62 (20060101); G06K 9/00 (20060101); G06N 3/04 (20060101);