Was having a look at the recent TinyML conference which we know Anil was at for the Sept one.
This one is the EMEA Oct one and noticed that one of our Engineers was doing a Demo / Poster as below fwiw as not sure if posted before.
TinyML Summit. The topic is advances in ultra-low power Machine Learning technologies and applications.
www.tinyml.org
Demos & Posters
Exploiting activation sparsity for easy model optimization in the Akida processor
Sébastien CROUZET, R&D Engineer, Brainchip
Abstract (English)
Neuromorphic processors offer high processing efficiency, but at the cost of an apparent technical barrier for the new user. BrainChip’s Akida processor mitigates this obstacle by running standard convolutional neural networks, and by offering transparent and performant quantization and lossless conversion methods. Neuromorphic efficiency is retained because the event–based processor can exploit the sparsity (high number of zero–values) inherently present in CNN activation maps.
Nonetheless, selection and optimization of models to target Akida (or indeed, any dedicated hardware) can seem daunting without expert knowledge. Here we show that i) by applying simple off–the–shelf optimization techniques, such as regularization and pruning, we can significantly accelerate inference at minimal reduction to accuracy; ii) applying these techniques provides a powerful lever to manipulate the speed/accuracy trade–off according to task requirement; iii) this in turn reduces the requirement to maintain an extensive library of pretrained models to cover a range of task complexities – instead, a handful of pretrained models allows a continuum of final model sizes.
[Method] We select a representative and well–known benchmark task, visual wakewords (VWW), starting from a tiny–ML relevant solution, MobileNet V1, alpha=0.25 (as per the tinyML Perf benchmark). We use a transfer learning–based pipeline, with imagenet pretraining, and quantization of both weights and activations to 4–bits (first layer 8–bit weights), 5 epochs of fine tuning on VWW. We test 2 techniques to optimize performance on our hardware: i) activity regularization added during training (mix of L1 and L2 loss), testing model accuracy and sparsity as a function of the amplitude of regularization ii) structured pruning (whole filters), using a well–known metric for filter selection (batch
normalization gamma). We first test the “prunability” of each layer individually by evaluating model accuracy (without re–training) as a function of % filters pruned. We then test several pruning strategies for the whole model (fixed % accuracy loss per layer; targeted pruning of the high–cost layers; …).
[Results] We first introduce a minor network optimization from the original MobileNet based on known properties of Akida: for low filter numbers (<=64), the efficient implementation of standard convolutional layers in Akida means that they run as fast as an equivalently sized separable convolution, while offering a moderately increased representational capacity and thus accuracy. Specifically, the first 3 separable conv layers of MobileNet v1 are swapped for standard conv layers, forming what we will call “AkidaNet”. With quantization at 4–bits, and using the alpha=0.25 network, we obtain a baseline accuracy of 83.3% on the VWW validation set, with single–sample (batch–size = 1) processing rate of 66 FPS (latency 15.15 ms). Applying a combination of regularization and pruning to optimize this model, we can accelerate processing to 133 FPS while maintaining >80% accuracy. We then test optimization from larger starting models (alpha=0.5 or even 1.0) and find that we can reduce models sufficiently to achieve processing rates similar to the baseline, but with higher accuracy (a=0.5, 84.1% acc. at 84 FPS; a=1.0, 84.8% acc. at 69 FPS)
Also see that TDK InvenSense playing with minimal Bit Weight Quantization as well.
Towards Universal 1-bit Weight Quantization of Neural Networks with end-to-end deployment on ultra-low power sensors
Le MINH TRI, Ph.D. student in deep learning, TDK InvenSense