"The next generation of Arm’s Ethos micro-NPU, Ethos-U85, is designed to support transformer operations, bringing generative AI models to IoT devices. The IP giant is seeing demand for transformer workloads at the edge, according to Paul Williamson, senior VP and general manager for Arm’s IoT line of business, though in much smaller forms than their bigger brothers, large language models (LLMs)."
"The 4 TOPS offered by Ethos-U85 can propel IoT transformers into the human-usable domain, he added. Now that all of TinyLlama’s operators can be mapped to the NPU without falling back to the CPU, a reasonable human-readable throughput of 8-10 tokens per second is achievable (depending on exact configuration of the NPU)."
"We have some people running ahead, saying ‘I’m going to put it in a consumer device next week,’ but in other areas people are prototyping production line fault inspection models with a Raspberry Pi—they are not worried about optimization, they just want to prove that it works"
"Customers for Arm’s first- and second-generation Ethos NPUs so far include Renesas, Infineon, Himax and Alif Semiconductor. Customers can experiment with generative AI models using Arm’s virtual hardware simulations today, with Ethos-U85 expected to be on the market in silicon in 2025."