Arm announces Cortex-X4 among latest CPU and GPU designs
Coming to silicon near you ... next year
Dan Robinson
Mon 29 May 2023 // 00:30 UTC
Arm is set to today announce more CPU and GPU designs, with a promise of performance and power efficiency gains for laptop and smartphone system-on-chips.
The blueprints will be marketed under the Total Compute Solutions umbrella; this is an Arm-curated collection of chip technologies designed, tested, and optimized to work together seamlessly. The idea being that Arm's customers can license an optimal package of cores and controllers selected and arranged by Arm, and drop them all into an SoC and get to market faster.
Arm has been
touting such packages since 2021, and the latest incarnation is dubbed TCS23. This offers compute clusters that can use a mix of three CPU core types, and uses the latest Armv9.2 architecture.
It also announced GPUs based on a new fifth-generation architecture, plus a redesigned DynamIQ Shared Unit (DSU) that serves as the glue logic for core clusters.
For those of us buying devices rather designing them, today's announcements indicate the potential direction of future higher-end Arm-powered personal computing hardware will take, the number of CPU cores they'll likely use, the types of core, and so on, which will dictate how much oomph there is for applications and how battery friendly it will all be.
In the TCS23 approach, a CPU cluster can comprise up to 14 cores, made up of a mix of three types: performance cores, mid-level cores, and power-efficient cores.
Arm's lays out its compute cluster plan - Click to enlarge
The performance core in the group is the new Cortex-X4, while the mid core role is filled by the Cortex-A720, and the power efficient design is the Cortex-A520. In TCS23 these are all 64-bit, Arm said. The last two on that list would traditionally have filled the “big” and “little” roles in Arm’s
big.LITTLE architecture. Now it's looking more like bigger.big.LITTLE with the X4 in the mix.
Arm told us that with the power-efficiency improvements seen in the Cortex-A720, licensees should be able to use a few more of those capable mid-level cores and fewer little efficiency cores than before. In other words, the cluster mix can lean more toward a bunch of A720s as the main workhorses that sustain performance, a big relatively power hungry X4 for the demanding tasks, and a sprinkling of small A520s to do light, battery-friendly work.
Thus while a typical configuration would be one X4, three A720s and four A520s, some customers may choose 1+5+2 instead, depending on the anticipated workloads and power envelope.
The Cortex-X4 is “the fastest Arm CPU ever built,” according to Arm director of CPU product management Stefan Rosinger. It boasts an increase in performance of 15 percent over the previous generation while consuming 40 percent less power, the Softbank-owned biz claimed.
Cortex-X4 now supports the option of a larger 2MB L2 cache. The core's performance boost is largely through tweaks to make instruction fetch processes more efficient, Arm said. The larger cache reduces memory traffic for larger footprint workloads.
Arm said the Cortex-X4 can be fabbed using, say, TSMC’s N3E 3nm production process, giving us an idea of how high end this CPU core is set.
Strong Arm? Chip designer's overview and benefits of its TCS23 offering ... Click to enlarge
For the Cortex-A720 mid-cores, these deliver a 20 percent increase in power efficiency, or an increase in performance at the same power level as last year’s Cortex-A715, Arm said, with faster branch misprediction recovery, plus lower latency for L2 cache hits.
The Cortex-A520, meanwhile, offers eight percent higher performance and 22 percent lower power, compared with last year’s Cortex-A510. It has the lowest power and area of the Armv9.2 cores, and builds on the merged-core microarchitecture introduced last year, where two cores share an L2 cache. It removes or scales down some features to reduce power, including removing a third ALU pipeline, Arm said.
Clock speeds are likely to be in the 4GHz range for the Cortex-X4, with the Cortex-A720 at 2.5GHz to 3GHz, and the Cortex-A520 at 2GHz down to 1.5GHz, Arm told us.
The DSU-120 that ties together the core cluster now has support for up to 32MB of shared L3 cache, as well as new power modes to help reduce leakage power. This includes the ability to put memory into a low power state when CPU cores are idle.
It is also the DSU-120 that enables more flexible core configurations of any combination of Cortex-X4, Cortex-A720 or Cortex-A520, including one with 10 Cortex-X4 and 4 Cortex-A720 that would typically feature in laptops.
On the GPU side, TCS23 sees the introduction of Arm’s fifth-generation architecture, which focuses on graphics performance at a system level with more advanced rendering pipelines to drive power efficiency, according to Dan Wilson, director of product management for Arm’s Client business.
Building on last year’s introduction of a flagship GPU with the Immortalis branding, the 5th Gen comprises the Immortalis-G720, Mali-G720 and Mali-G620. These are effectively the same design, with the difference between them being the number of shader cores that licensees opt for.
Pleading the fifth ... Arm's overview of its latest graphics processing units
Thus the Mali-G620 has five cores or fewer, the Mali-G720 has six to nine cores, and Immortalis-G720 has 10 cores or more, with an upper limit of 16. Arm also specifies that the Immortalis must include a hardware ray-tracing unit.
The major feature update in this generation is Deferred Vertex Shading (DVS). This appears to involve postponing most of the heavy rendering work until after the geometry processing is done, at which point any hidden surfaces can be discarded rather than being rendered.
Arm said that this was implemented to cope with the growing scene complexity of games, keeping up the frame rate and enabling the next generation of software and real-time 3D applications on mobile devices
But another effect of DVS is that it requires 40 percent less memory bandwidth, and this leads to more energy savings, with the new GPU claimed to be 15 percent more energy efficient on average than the previous generation.
At the same the systems are touted as offering 15 percent more peak performance over the previous generation. Arm shied away from comparing its 5th Gen GPUs against rivals, but claimed that the previous generation outperformed rival SoCs in Android handsets on ray tracing and variable rate shading tasks.
Blueprint ... The TCS23 summary
TCS23 has been designed to support the Android Virtualization Framework (AVF), introduced in Android 13, which effectively isolates Android applications from each other in separate sandboxes, Arm said.
It should be remembered that Arm does not make its own chips, so TCS23 will eventually appear in silicon from Arm licensees at some point in the future. Rosinger said that Arm expected to see some products come to market early next year. ®
Coming to silicon near you ... next year
www.theregister.com