Regarding inefficiency: you‘re absolutely right and even more so if we only consider energy restricted devices or applications.
But the more I‘am reading about the current trends in the semiconductor space (especially chiplets and packaging memory on top of logic etc.) and the increasing statements in industry interviews about the breakneck speed regarding AI/ML development in general and how fast new approaches/algorithms get established, I started wondering if we might see some transition period (on the inference side), at least in areas where energy consumption is not a (primary) problem as long as we‘re not talking GPU levels but flexibility (future proofing). Lets say devices that are plugged but might need updates for 10 years or so (in an area of rapid changes like AI/ML), maybe mobile network base stations, industrial wireless communication systems etc.
I might be totally off, but I could imagine for custom semi customers there might even be a FPGA chiplet option available in the future - e.g integrated into an AMD/Intel CPU (maybe just as a safety net alternative to / or accompanying additional tensor cores or whatever highly specialized accelerator flavor there might be integrated in the future). - So basically a trade off regarding energy consumption & cost in favor of flexibility.
Edit - examples added:
FPGA as a chiplet:
Speedcore™ eFPGA IP can be deployed in various form factors including being integrated into a customer-defined chiplet that can be deployed in a system-in-package (SiP) solution via 2.5D interconnect technology. The following figures show three options for the SiP integration. The first is based...
www.achronix.com
Ultra low power FPGA (starting at 25 µW):