"Building Temporal Kernels with Orthogonal Polynomials
20 May 2024
Yan Ru Pei
Brainchip Inc.
Laguna Hills, CA 92653
ypei@brainchip.com
Olivier Coenen
Brainchip Inc.
Laguna Hills, CA 92653
ocoenen@brainchip.com
Abstract
We introduce a class of models named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), which contains temporal convolution kernels generated from orthogonal polynomial basis functions. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1)
99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and
100% with a small additional output filter; 2)
99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.
1 Introduction
...
In Section 5, we run three event-based benchmarks:
1) the IBM DVS128 hand gesture recognition dataset,
2) the CVPR 2024 AIS event-based eye tracking challenge,
3) and the PROPHESEE 1 megapixel automotive detection dataset (Prophesee GEN4 Dataset).
We achieved SOTA results on all three benchmarks.
The code for building the structured temporal kernels, along with a pre-trained PLEIADES network for evaluation on the DVS128 dataset is available here:
https://github.com/PeaBrane/Pleiades
...
5 Experiments
...
Table 1: The raw 10-class test accuracy of several networks on the DVS128 dataset. With the
exception of models marked with an asterisk, no output filtering is performed on the networks.
PLEIADES is evaluated on output predictions where all temporal layers process nonzero valid frames,
which incurs a natural warm-up latency of 0.44 seconds (see Section 5.1). Additionally, a majority
filter of window 0.15 seconds is applied to the raw PLEIADES predictions.
...
7 Conclusion
We introduced a spatiotemporal network with temporal kernels built from orthogonal polynomials. The network achieved state-of-the-art results on all the event-based benchmarks we tested, and its performance is shown to be stable under temporal resampling without additional fine-tuning. Currently, the network is configured as a standard neural network, which by itself is already ultra- light in memory and computational costs. To truly leverage the full advantage of event-based processing, we can consider using intermediate loss functions to promote activation sparsity [ 24 ]. Another direction is to adapt/convert this architecture into a spiking system via Lebesgue sampling [ 2] of the structured temporal kernels, to make efficient computations/predictions of future spike timings at each temporal layer, for even further edge-compatibility.
8 Acknowledgement
We would like to acknowledge Nolan Ardolino, Kristofor Carlson, M. Anthony Lewis, and Anup Varase (listed in alphabetical order) for discussing ideas and offering insights for this project. We would also like to thank Daniel Endraws for performing quantization studies on the PLEIADES
network, and Sasskia Brüers for help with producing the figures.
..."
full PDF: