Thank’s for sharing that link,
@Drewski!
I suspect that not that many forum members have actually clicked on it, though, which is a shame, as that BrainChip X post

links to an excellent blog post written by our Chief Development Officer Jonathan Tapson titled
“How to Think About Large Language Models on the Edge”.
BrainChip also referred to that blog post on LinkedIn today:
ChatGPT changed the AI landscape overnight. But nearly 3 years later, the Tech industry is still figuring out how to effectively use LLMs, particularly in real-world edge computing applications. In our latest blog, Chief Development Officer Jonathan Tapson breaks down: · Why foundational...
www.linkedin.com
I’d especially like to recommend this very articulate article to forum members enamoured with GenAI responses, which I personally tend to take with a bucket of salt (if I read them at all).
How to Think About Large Language Models on the Edge Jonathan Tapson, BrainChip
brainchip.com
How to Think About Large Language Models on the Edge
Jonathan Tapson, BrainChip Inc.
ChatGPT was released to the public on November 30th, 2022, and the world – at least, the connected world – has not been the same since. Surprisingly, almost three years later, despite massive adoption, we do not seem much closer to understanding how to use Large Language Models effectively in our personal life, but as importantly, in professional and business applications.
What LLMs Really Are
A large part of this uncertainty stems from misunderstandings about what an LLM is, and how it really works. In this article I’ll unpack some of that and hopefully give a clear picture of LLMs that enable good decision-making.
The key to understanding LLMs is that they all start as what are called
Foundational LLMs. These are actually really simple mechanisms, despite being composed of billions of neural elements. The simplicity arises from the way they are trained.
The training consists of taking some text from the internet – e.g., the whole of Wikipedia in all its languages – then feeding it to the LLM one word1 at a time. The LLM is then trained to predict the next word most likely to appear in that context.
The entirety of the apparent intelligence of an LLM is based on its ability to predict what comes next in a sentence.
This simple process can be carried out until the LLM has been trained on pretty much any text ever digitized in any language, which builds a modelthat has an incredible ability to build sentences and paragraphs. LLMs are amazing artifacts, containing a model of all of language, on a scale no human could conceive or visualize. What they do not do, though, is apply any value to the information, or the truthfulness of the sentences and paragraphs they have learned to produce.
An Illusion of Intelligence
I think of LLMs as being the equivalent of that one person we often have in our social circles – that person who can’t bear conversational silence and fills it with an endless stream-of-consciousness babble. What you are hearing is a grammatical flow of words, more or less connected in context, but there’s no information or usefulness to be derived from most of it.
LLMs are powerful pattern-matching machines but lack human-like understanding, common sense, or ethical reasoning. They can generate content that appears clearly inappropriate to humans but is merely a statistically probable sequence of words based on their training. For example, if you train an LLM on racist or deviant content, it will successfully reproduce this in any context, without any understanding of its meaning.
This lack of factualness notwithstanding, LLMs are amazingly convincing to talk to because they are trained that way. They know, way better than a human, precisely what to say, but they don’t in any real sense know any facts; they know what a fact is supposed to sound like, so they can convincingly produce “facts” on cue.
The Risks of Misusing LLMs
The tech industry being what it is, multiple products based on foundational LLMs have been launched, without much thinking about how they will be used to just see how people will use them. LLMs are very good at summarizing and this use case works pretty well, but the inappropriate use of LLMs as search engines has produced lots of unhappy results.
A great way to think of an LLM is that it produces a surface of language, like a giant lumpy golf putting green, in the form of interconnected words. Any input sentence, or “prompt”, is like placing a ball down and putting it. The ball rolls along, connecting words into sentences according to its direction and velocity, until it comes to rest. A different ball, hit from the same point but in a different direction, produces different sentences. An LLM simply takes a bunch of input sentences and extends them along the surface of the language. Just as a golf ball rolls downhill and along the path of least resistance, the LLM output follows the path of the most likely words and assembles them into sentences.
As long as we think of an LLM as a machine for producing the next most likely sentences and paragraphs, we can make great use of it. As soon as we try and use a raw Foundational LLM as a search engine or a source of information, it’s like talking to a pathological liar. We’re going to get a response that sounds great but has only a coincidental relationship with the truth, and the algorithm is only guessing the next words based on the previous words from the text it was trained on.
So, how should we use LLMs? The answers depend on applications, but they are incredibly good at turning pre-existing information into words. Don’t let them find (or make up) the facts, but give them facts and let them explain or impart them.
Enter RAG: Retrieval-Augmented Generation
One way to use LLMs that offers a simple approach to this problem is the RAG-LLM, where RAG stands for Retrieval Augmented Generation. RAG LLMs are usually designed for answering queries in a specific subject, for example, how to operate a particular appliance, tool, or type of machinery. The LLM works by taking as much of the textual information about the subject, user manuals and so forth, then pre-processing it into small chunks containing a few specific facts. When the user asks a question, the software system identifies the chunk of text which is most likely to contain the answer. The question and answer are then fed to an LLM, which generates a human-language answer in response to the query.
When one first builds RAG-LLMs, it seems like a completely counter-intuitive way to use LLMs. All the action of finding the answer happens before LLM involvement; why bother with that? Once you understand the issues with LLMs, it becomes obvious that RAG plays to the strengths of LLMs while mostly addressing their problems. There are many more sophisticated ways to enforce factualness on LLMs, but by and large they follow the RAG pattern in some way.
BrainChip’sApproach to LLMs at the Edge
At BrainChip, we build edge hardware systems that can execute LLMs to provide domain-specific intelligent assistance at the Edge. We also build models using an extremely compact LLM topology, Temporal Event Neural Networks (TENNs) based on state-space models combined with pre-processing information in a RAG system. Using this technology platform of optimized hardware and LLM models, BrainChip is able to demonstrate a stand-alone, battery-powered AI assistant that covers a huge amount of information. Like many companies working in this space, we believe we’re learning how to deploy LLMs in a way that starts to deliver on their massive promise in the Edge AI space.
Dr. Jonathan Tapson,
Chief Development Officer at BrainChip, was a tenured professor at multiple universities before becoming the Executive Director of the MARCS Institute of Brain, Behavior and Development in Western Sydney, Australia. He founded three successful technology companies as spin-outs from his research, and then became the first CSO of GrAI Matter Labs, later acquired by Snap, Inc. He has a PhD in Engineering and Bachelor’s degrees in Theoretical Physics and Electrical Engineering from the University of Cape Town.