Interesting.
ST to launch its first neural microcontroller with NPU
Business news | May 12, 2022
By Nick Flaherty
STMicroelectronics is set to launch its first microcontroller with a full neural processing unit (NPU).
The STM32N6 includes a proprietary NPU alongside a ARM Cortex core. This gives the same AI performance as a quad core processor with AI accelerator but at one tenth the cost and one twelfth the power consumption, says Remi El-Ouazzane President, Microcontrollers & Digital ICs Group at ST.
The chip will sample at the end of 2022, he said. While there is no mention of the ARM core that will be used, the performance and power figures point to ARM’s M55 or even its latest core, the M85, which has recently been announced. The M85 is ARM’s highest performance M core, issuing up to three instruction per cycle with has internal accelerators for AI that help to boost the performance.
ST is a key lead developer for ARM’s microcontroller cores and uses the ARM M7 core in the dual core M4/M7 STM32H7 and in the STM32F7 family alongside an ART AI accelerator. A new family, N6, points to the use of a new core.
Supporting AI in industrial and embedded designs was a key driver for the recent acquisition of French software tool developer Cartesiam and is part of the strategy to achieve $20bn of revenue between 2025 and 2027.
“The new STM32N6 Neural MCU is dramatically lowering the AI technology implementation price point. This breakthrough supports our roadmap of new generation intelligent sensors allowing rapidly growing adoption in Smart Cities,” said Vincent SABOT, Executive Managing Director of developer Lacroix.
The choice of architecture is key for adding security support and integration with cloud services, and the M55 and M85 supports the ARMv8.1-M architecture. Yesterday ST announced deals with Microsoft and Amazon to connect the microcontrollers to the cloud.
ST is integrating its STM32U5 microcontrollers (MCUs), based on the 160MHz ARM Cortex-M33 core, with Microsoft Azure real time operating system (RTOS) and IoT Middleware and a certified secure implementation of ARM Trusted Firmware -M (TF-M) secure services for embedded systems.
The integration uses the hardened security features of the STM32U5 complemented with the hardened key store of an STSAFE-A110 secure element.
The integration with Amazon Web Services (AWS) also uses the STM32U5, this time with Amazon’s FreeRTOS real time operating system and the ARM trusted-firmware for embedded systems (TF-M).
The reference implementation is built on ST’s B-U585I-IOT02A discovery kit for IoT nodes with support for USB, WiFi, and Bluetooth Low Energy connectivity, as well as multiple sensors. The STSAFE-A110 secure element support is pre-loaded with IoT object credentials to help secure and simplifies attachment between the connected objects and the AWS cloud.
FreeRTOS comprises a kernel optimized for resource-constrained embedded systems and software libraries for connecting various types of IoT endpoints to the AWS cloud or other edge devices. AWS’s long-term support (LTS) is maintained on FreeRTOS releases for two years, which provides developers with a stable platform for deploying and maintaining their IoT devices.
Hardware cryptographic accelerators, secure firmware installation and update, and enhanced resistance to physical attacks provide PSA Certified Level-3 and SESIP 3 certifications.
ST will release an STM32Cube-based integration of reference implementation for both the Azure and AWS integrations in Q3 2022 that will further simplify IoT-device design leveraging tight integration with the wider STM32 ecosystem.
![]()
ST to launch its first neural microcontroller with NPU
STMicroelectronics is set to launch its first microcontroller, the STM32N6, with a full neural processing unit (NPU).www.eenewseurope.com
The NPU of the
I haven't researched ReRAM NNs in depth, but I think Marco Cassis, president of ST’s Analog, MEMS and Sensors Group was not talking about Akida when he ruled out "spiking neural network chips, also called neuromorphic, as not mature, saying current convolutional neural networks can tap into reduced precision and semiconductor scaling to get more performance. However these CNN devices struggle with power consumption and memory bandwidth challenges that get in the way of scalability."
As we have discussed repeatedly, ReRAMs have their own problems. It is true that, in theory, they provide a much closer synaptic analogy with wetware, but the lack of precision of IC manufacturing at a micro-scale means that they lack accuracy due to resistance variations between individual ReRAMs. The currents from a few hundred (or more) need to be added together to reach a synaptic threshold voltage, so while some errors may cancel out, there is the possibility of cumulative errors.
There are techniques to compensate for the inherent variability, but they immediately reduce a major advanyage of ReRAM, the footprint of each ReRAM cell on the silicon wafer ... this from Weebit:
https://www.weebit-nano.com/technology/reram-memory-module-technology/
"An efficient ReRAM module must be designed and developed in close relation with the memory bitcell so it can optimize the functionality of the memory array. Due to the inherent variability of ReRAM (RRAM) cells, specially developed algorithms are key to the process of programming and erasing cells. These algorithms must be delicately balanced between programming time (the quicker, the better), current (the lower, the better), and cell endurance (allowing each individual cell to operate for as many program/erase [P/E] cycles as possible). Voltage levels, P/E pulse widths and the number of such pulses must be optimized to work with a given bitcell technology.
When reading any given bit, the data must be verified against other assistive information to make sure there are no read errors that could impair overall system performance.
Voltage and current levels must be carefully examined throughout the memory module for any operation – including read, program and erase – to keep power consumption to a minimum and ensure the robustness and reliability of the memory array."
In addition, they need larger operating voltages than digital CMOS because they need to divide the voltage into a number of voltage steps corresponding to the number of synaptic inputs which are added to reach the synaptic threshold. The size of the operating voltage limits the size of the manufacturing process, eg 22 nm, before the voltage can jump between conductors.
Our friend Weebit has planted their flag at 12 nm, but I don't know whether this achievable or aspirational.
https://www.weebit-nano.com/technology/overview/
View attachment 32255
https://www.weebit-nano.com/technology/reram-bitcell/
Weebit scaling down its ReRAM technology to 22nm
Weebit is scaling its embedded ReRAM technology down to 22nm – one of the industry’s most common process nodes.
To be useful in a CPU or GPU, ReRAM output must be converted to digital in an ADC (analog to digital converter)
It sounds like a lot of fluster, but the Weebit ReRAM hybrid analog/digital neuromorphic circuit (something I have previously dubbed Frankenstein) is well received in the market, even though it is not spruiked as having the capabilities of Akida, but rather for its memory.
How many more bells and whistles are needed to develop a ReRAM NN?
The NPU of the STM32N6 is internally developed by STM.

ST joins MLCommons, why 1 benchmark can help teams adopt machine learning at the edge
<p><strong>Today, ST is officially becoming a member of <a title="MLCommons" href="https://mlcommons.org/en/" target="_blank" rel="noopener">MLCommons™</a>, the consortium responsible for benchmarks quantifying machine learning performance in mobile platforms, data centers, and embedded systems...

Remi El-Ouazzane 10mo
Earlier today during STMicroelectronics Capital Markets Day (https://cmd.st.com), I gave a presentation on MDG contribution to our ambition of reaching $20B revenue ambition by 2025-27.
During the event, I was proud to pre-announce the #STM32N6: a high performance #STM32 #MCU with our new internally developed Neural Processing Unit (#NPU) providing an order of magnitude benefit in both inference/w and inference/$ against alternative MPU solutions
The #STM32N6 will deliver #MPU #AI workloads at the cost and the power consumption of #MCU. This a complete game changer that will open new ranges of applications for our customers and allow them to democratise #AI at the edge.
I am excited to say we are on track to deliver first samples of the #STM32N6by the end of 2022. I
I am even more excited to announce that LACROIX will leverage this technology in their next generation smart city products.
Stay tuned for more news on the #STM32N6 in the coming months :=)
########################################################
Bells and whistles:
This is how STM does in-memory compute (not a pretty sight):
EP3761236A2 ELEMENTS FOR IN-MEMORY COMPUTE
A memory array arranged in multiple columns and rows. Computation circuits that each calculate a computation value from cell values in a corresponding column. A column multiplexer cycles through multiple data lines that each corresponds to a computation circuit. Cluster cycle management circuitry determines a number of multiplexer cycles based on a number of columns storing data of a compute cluster. A sensing circuit obtains the computation values from the computation circuits via the column multiplexer as the column multiplexer cycles through the data lines. The sensing circuit combines the obtained computation values over the determined number of multiplexer cycles. A first clock may initiate the multiplexer to cycle through its data lines for the determined number of multiplexer cycles, and a second clock may initiate each individual cycle. The multiplexer or additional circuitry may be utilized to modify the order in which data is written to the columns.
So add that to Weebit's hybrid analog/digital ReRAM ...
We you did ask!