Isn't this similar to what we're trying to achieve to slash power consumption
@Diogenese?
SNN's eliminating the need for matrix multiplication without affecting performance?
Researchers claim new approach to LLMs could slash AI power consumption
Work was conducted as concerns about power demands of AI have risen
Graeme Burton
27 June 2024• 2 min read
SHARE
ChatGPT uses 10 times the electricity of a Google search – but cutting out ‘matrix multiplication’ could cut this number without affecting performance
New research suggests that eliminating the ‘matrix multiplication' stage of large-language models (LLMs) used in AI could slash power consumption without affecting performance.
The research was published at the same time that increasing concerns are being raised over the
power demands that AI will make over the course of the decade.
Matrix multiplication – abbreviated to MatMul in the
research paper – performs large numbers of multiplication operations in parallel, becoming the dominant operation in the neural networks that drive AI primarily due to GPUs being optimised for such operations.
But by leveraging Nvidia CUDA [Compute Unified Device Architecture] technology instead, along with optimised linear algebra libraries, this process can be efficiently parallelised and accelerated – cutting power consumption without penalising performance.
In the process, the researchers claim to have "demonstrated the feasibility and effectiveness of the first scalable MatMul-free language model", in a paper entitled Scalable MatMul-free Language Modelling.
The researchers created a custom 2.7 billion parameter model without using matrix multiplication, and found that performance was on a par with state-of-the-art deep learning models (called ‘transformers', a fundamental element of natural language processing). The performance gap between their model and conventional approaches narrows as the MatMul-free scales, they claim.
They add: "We also provide a GPU-efficient implementation of this model, which reduces memory usage by up to 61% over an unoptimised baseline during training. By utilising an optimised kernel during inference, our model's memory consumption can be reduced by more than 10 times compared to unoptimised models."
Less hardware-heavy AI models could also enable more pervasive AI, freeing the technology from its dependence on the data centre and the cloud. In addition, both OpenAI and Meta are set to
unveil new models that, they claim, will be capable of reasoning and planning.
However, they cautioned, "one limitation of our work is that the MatMul-free language model has not been tested on extremely large-scale models (for example, 100+ billion parameters) due to computational constraints".
They called for well-resourced institutions and organisations to build LLMs utilising lightweight models, "prioritising the development and deployment of matrix multiplication-free architectures".
Moreover, the researchers' work has not yet been subject to peer review. Indeed, power consumption on its own matters less than energy usage per unit of ‘output' – information that is not provided in the research paper.
ChatGPT uses 10 times the electricity of a Google search – but cutting out ‘matrix multiplication’ could cut this number without affecting performance
www.computing.co.uk