Looks like some friends in Japan, with a little support from Megachips, have been playing with Akida & MetaTF
Apols if already posted as I may have missed it and haven't done a search.
Short video end of post.
Paper
HERE
License: arXiv.org perpetual non-exclusive license
arXiv:2408.13018v1 [cs.RO] 23 Aug 2024
Robust Iterative Value Conversion: Deep Reinforcement Learning for Neurochip-driven Edge Robots
Yuki Kadokawa
kadokawa.yuki@naist.ac.jpTomohito Kodera
kodera.tomohito.kp9@is.naist.jpYoshihisa Tsurumine
tsurumine.yoshihisa@is.naist.jpShinya Nishimura
nishimura.shinya@megachips.co.jpTakamitsu Matsubara
takam-m@is.naist.jpNara Institute of Science and Technology, 630-0192, Nara,
Japan MegaChips Corporation, 532-0003, Osaka, Japan
Abstract
A neurochip is a device that reproduces the signal processing mechanisms of brain neurons and calculates Spiking Neural Networks (SNNs) with low power consumption and at high speed. Thus, neurochips are attracting attention from edge robot applications, which suffer from limited battery capacity. This paper aims to achieve deep reinforcement learning (DRL) that acquires SNN policies suitable for neurochip implementation. Since DRL requires a complex function approximation, we focus on conversion techniques from Floating Point NN (FPNN) because it is one of the most feasible SNN techniques. However, DRL requires conversions to SNNs for every policy update to collect the learning samples for a DRL-learning cycle, which updates the FPNN policy and collects the SNN policy samples. Accumulative conversion errors can significantly degrade the performance of the SNN policies. We propose Robust Iterative Value Conversion (RIVC) as a DRL that incorporates conversion error reduction and robustness to conversion errors. To reduce them, FPNN is optimized with the same number of quantization bits as an SNN. The FPNN output is not significantly changed by quantization. To robustify the conversion error, an FPNN policy that is applied with quantization is updated to increase the gap between the probability of selecting the optimal action and other actions. This step prevents unexpected replacements of the policy’s optimal actions.
We verified RIVC’s effectiveness on a neurochip-driven robot. The results showed that RIVC consumed 1/15 times less power and increased the calculation speed by five times more than an edge CPU (quad-core ARM Cortex-A72). The previous framework with no countermeasures against conversion errors failed to train the policies. Videos from our experiments are available:
Excerpts:
5.1 Construction of Learning System for Experiments
5.1.1 Entire Experiment Settings
This section describes the construction of the proposed framework shown in Fig.
2.
We utilized a desktop PC equipped with a GPU (Nvidia RTX3090) for updating the policies and an Akida Neural Processor SoC as a neurochip [9, 12]. The robot was controlled by the policies implemented in the neurochip. SNNs were implemented to the neurochip by a conversion executed by the MetaTF of Akida that converts the software [9, 12]. Samples were collected by the SNN policies in both the simulation tasks and the real-robot tasks since the target task is neurochip-driven robot control. For learning, the GPU updates the policies based on the collected samples in the real-robot environment. Concerning the SNN structure, the quantization of weights 𝑤𝑠 described in Eq. (
16) and the calculation accuracy of the activation functions described in Eq. (
19) are verified in a range from 2- to 8-bits; they are the implementation constraints of the neurochip [
9].
Table 3: Hardware performance of policies: FPNN was evaluated by edge-CPU (Raspberry Pi 4: quad-core ARM Cortex-A72).
SNN was evaluated by neurochip (Akida 1000 [9]). “Power cons” and “Calc. speed” denote power consumption and calculation speed for obtaining one action from NN policies using each piece of hardware. Power consumption was measured by voltage checker (TAP-TST8N).
Hardware | Edge-CPU | Neurochip |
---|
Network | FPNN | SNN |
---|
Power consumption [mW] | 61 | 4 |
---|
Calculation speed [ms] | 205 | 40 |
---|
7 Conclusion
We proposed RIVC as a novel DRL framework for training SNN policies with a neurochip in real-robot environments. RIVC offers two prominent features: 1) it trains QNN policies, which can be robust for conversion to SNN policies, and 2) it updates the values with GIO, which is robust to the optimal action replacements by conversion to SNN policies.
We also implemented RIVC for object-tracking tasks with a neurochip in real-robot environments. Our experiments show that RIVC can train SNN policies by DRL in real-robot environments.
Acknowledgments
This work was supported by the MegaChips Corporation. We thank Alonso Ramos Fernandez for his experimental assistance.