BRN Discussion Ongoing

Kingkong2015 · May 5, 2025

manny100 said:
Details of Tony Lewis presentation at Embedded Vision Summit on Wednesday 21st May'25.
"
Date: Wednesday, May 21
Start Time: 2:05 pm
End Time: 2:35 pm
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, we contrast state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area. New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. We present a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge."
See bold above. Seems that BRN are now able to migrate traditional transformer models to State Spece models (SSM's) without major performance loss.
Note the italics above that SSMs reduce energy requirements and chip area. Think Pico or a Pico plus.
Pico runs off TENNs which is as type of SSM.
Does that mean developers can now feast on the multitude of traditional models and distill them to SSMs? Appears so.
Is this potentially another game changer?
We might find out more tomorow?
I have the 'kiddy' co pilot which you do not get much out of.
Is someone with GPT4 or Grok etc able to quiz the AI to see what the possibilities are?

U real? If pico can run llama 1b. It is not hard to stack up to run full llama 405b model. It will be significantly cheaper to buy and run this chip at much lower power than nvidia 5090

jrp173 · May 5, 2025

Guzzi62 said:
Off-course, but you just keep ranting on and on, there is a difference!

As I said countless times, why can't you downrampers wait until after the AGM? Let them speak first, but no, you just bla-bla-bla and want them gone NOW!

It's you that comes out childish by having no patience like a little kid!

You don't understand how much it takes to shift AI technology to the edge but want it done yesterday!

This sub forum on HC taught me a lot and made me much more positive: Understanding the Technology adoption cycle! One poster, Obseverr is extremely knowledgeable and clearly knows the industry in depth.

Looking at it, I see positive signs, space and military have started using our technology, which actually is very positive because they are typically early adopters.

I think we will get there this year and the next one.

Yes I know that have been said for the last couple of years, but technology shift towards edge AI have slowly started now IMO, but I will say, if they can't sell IPs over the next year, Sean has failed, so I will listen to them carefully tomorrow and take it from there.

Kingkong2015 · May 5, 2025

If it’s true, the stock price will skyrocket like Nvidia’s.

manny100 · May 5, 2025

Kingkong2015 said:
Fk. This is a game changer. We can run Facebook llama models on akida pico.

I do not want to get to excited but it looks pretty good. Even a Pico plus size would be good. Drastically reduce energy use and chip area. Hopefully a Pico or Pico plus?
If its small enough to fit into a mobile phone that would be a game changer for sure - the gap for hackers via mobiles would be closed - that is a game changer on its own. Guessing though
Its wait and see.
Although at tomorows tech meeting i do expect we will get some good tech news. Otherwise why schedule it right before the AGM?
To maybe get a little WOW factor before attending the AGM? Nothing like a game changer to soften the mood - speculation of course.

Kingkong2015 · May 5, 2025

manny100 said:
I do not want to get to excited but it looks pretty good. Even a Pico plus size would be good. Drastically reduce energy use and chip area. Hopefully a Pico or Pico plus?
If its small enough to fit into a mobile phone that would be a game changer for sure - the gap for hackers via mobiles would be closed - that is a game changer on its own. Guessing though
Its wait and see.
Although at tomorows tech meeting i do expect we will get some good tech news. Otherwise why schedule it right before the AGM?
To maybe get a little WOW factor before attending the AGM? Nothing like a game changer to soften the mood - speculation of course.

You didn’t fully understand the implication. If Pico can run LLaMA 1B, it can easily scale up to run DeepSeek, Qwen, and the full LLaMA model — something only the NVIDIA 5090 can currently handle(priced between $5,599 and $7,199 AUD), and even that is out of stock. BrainChip could manufacture and sell it directly to the consumer market, bypassing those outdated 10-year IP deal constraints. The demand for running local AI models is tremendous, especially since ChatGPT tokens are becoming increasingly expensive.

And this will immediately position us as one of the biggest competitors of Nvidia.

manny100 · May 5, 2025

Kingkong2015 said:
You didn’t fully understand the implication. If Pico can run LLaMA 1B, it can easily scale up to run DeepSeek, Qwen, and the full LLaMA model — something only the NVIDIA 5090 can currently handle(priced between $5,599 and $7,199 AUD), and even that is out of stock. BrainChip could manufacture and sell it directly to the consumer market, bypassing those outdated 10-year IP deal constraints. The demand for running local AI models is tremendous, especially since ChatGPT tokens are becoming increasingly expensive.

Thanks Tony's presentation description looked the goods and i just hope its accurate.

7für7 · May 5, 2025

Tothemoon24 said:
Gee Sean was unlucky

View attachment 83761

Nr. 16 I guess … unlucky

FJ-215 · May 5, 2025

Kingkong2015 said:
You didn’t fully understand the implication. If Pico can run LLaMA 1B, it can easily scale up to run DeepSeek, Qwen, and the full LLaMA model — something only the NVIDIA 5090 can currently handle(priced between $5,599 and $7,199 AUD), and even that is out of stock. BrainChip could manufacture and sell it directly to the consumer market, bypassing those outdated 10-year IP deal constraints. The demand for running local AI models is tremendous, especially since ChatGPT tokens are becoming increasingly expensive.

And this will immediately position us as one of the biggest competitors of Nvidia.

Hi KK,
The problem with Pico (and Akida 2) is that they only exist on the drawing board. BRN have stated that there is no plan to tape out gen 2, although they did come up with a compromise and we now have a FPGA version that potential customers can use via the cloud.

That was the story with AKD1000, the IP was released in May 2019 but no one would look at it until we stumped up the cash to make the actual, physical chip.

History repeating???

Frangipani · May 5, 2025

JB49 said:
Steve talks about Edge Impulse. I wonder when this was recorded.

Neuromorphic Computing Delivers Low-Power AI

BrainChip’s Akida platform offers low-power, neuromorphic machine-learning capabilities.

www.electronicdesign.com

Hi @JB49,

it must be the CES 2025 media interview with Bill Wong, which was recorded two months before Edge Impulse was acquired by Qualcomm.

This is a screenshot I took on 9 January (the one that got subsequently deleted):

(I’m pretty sure this accompanying picture of a podcast recording shows Steve Brightfield interviewing Bill Eichen from De Girum, though, not Bill Wong.)

Neuromorphic Computing Delivers Low-Power AI

BrainChip’s Akida platform offers low-power, neuromorphic machine-learning capabilities.

www.electronicdesign.com

Have a look at this picture and note the armchair, sofa and sofa cushions in the background:

As you can see below, the above photo must have been taken in the Venetian Tower suite BrainChip had booked for CES 2025:

At the time, everything between BrainChip and Edge Impulse seemed perfectly harmonious:

CES 2025

Check back frequently for updates, as we’ll be adding new information throughout CES 2024 and beyond.

brainchip.com

Two months later, on 10 March, we found out that Edge Impulse had been acquired by Qualcomm:

Our Next Chapter: Edge Impulse is Joining Forces with Qualcomm Technologies!

Today, Edge Impulse is excited to announce its next step forward — we’ve signed an agreement to join the Qualcomm Technologies team.

www.edgeimpulse.com

Fast forward to 5 April, when you spotted the suspension of BrainChip models on Edge Impulse, which continues to be the status quo even today, a whole month later (https://docs.edgeimpulse.com/docs/edge-ai-hardware/cpu-+-ai-accelerators/akd1000), although according to https://brainchip.com/partners/ Edge Impulse remains a BrainChip enablement partner…

Bravo · May 5, 2025

FJ-215 said:
Hi KK,
The problem with Pico (and Akida 2) is that they only exist on the drawing board. BRN have stated that there is no plan to tape out gen 2, although they did come up with a compromise and we now have a FPGA version that potential customers can use via the cloud.

That was the story with AKD1000, the IP was released in May 2019 but no one would look at it until we stumped up the cash to make the actual, physical chip.

History repeating???

Hi FJ,

Maybe this is what Alf was alluding to in that Linkedin video when said something like “Our next step may or may not be to turn this into a chip?“. Or something elusive like that.

Kingkong2015 · May 5, 2025

FJ-215 said:
Hi KK,
The problem with Pico (and Akida 2) is that they only exist on the drawing board. BRN have stated that there is no plan to tape out gen 2, although they did come up with a compromise and we now have a FPGA version that potential customers can use via the cloud.

That was the story with AKD1000, the IP was released in May 2019 but no one would look at it until we stumped up the cash to make the actual, physical chip.

History repeating???

That’s different. These chips are designed for edge devices and will be manufactured into SoCs, which is why an FPGA demo is sufficient at this stage. The key value lies in the AI model it runs and its ultra-low power consumption. The AI models provided in the Akida Zoo—or customers’ own proprietary models—are what enable different functionalities

Do you think a regular computer can’t handle vision tasks on a CPU or GPU? Of course it can—too easily. But desktop CPU/GPU can't run on edge devices

But if BrainChip can manufacture, or at least demonstrate, the capability to run the full LLaMA 400B model on its design—even Elon Musk would place an order. It currently takes 5 A100 GPUs (costing around $30,000 each) to run the 400B model.

Hopefully, BrainChip’s CTO can quickly design some demos to validate this concept—perhaps by stacking multiple Akida 1000 chips together to prove its feasibility.

DK6161 · May 5, 2025

Pom down under said:
Glad only a small fraction of my retirement is in brainchip especially having hardly any super and all the spare cash I’ve put into BRN and now hold a nice little parcel and if it a success it will make me a very happy man in retirement. But if this fails then my other investment was buying a property in 2019 that we now rent out now will get me out of the shit just incase.

Pom, you could've had 2 investment properties lad.
Oh well I am kind of on the same boat. Could've put a deposit down for an investment property but managed to persuade the wife to go into speculative shares. If all goes to sh1te, then at least there is aged pension.

DK6161 · May 5, 2025

manny100 said:
I do not want to get to excited but it looks pretty good. Even a Pico plus size would be good. Drastically reduce energy use and chip area. Hopefully a Pico or Pico plus?
If its small enough to fit into a mobile phone that would be a game changer for sure - the gap for hackers via mobiles would be closed - that is a game changer on its own. Guessing though
Its wait and see.
Although at tomorows tech meeting i do expect we will get some good tech news. Otherwise why schedule it right before the AGM?
To maybe get a little WOW factor before attending the AGM? Nothing like a game changer to soften the mood - speculation of course.

They scheduled it before the AGM to soften the blow to the CEO and board.
They should probably let people eat and drink so they are all in a good mood before the voting

Kingkong2015 · May 5, 2025

BrainChip is not an AI model company—companies like OpenAI (ChatGPT) and DeepSeek fall into that category.

Nor is BrainChip a chip manufacturer like Intel.

What it provides is its IP—a specialized “cooking recipe”—that enables AI models to run on edge devices with low power consumption and fast, efficient processing. While ARM provides the IP-based architecture that enables software to run on smartphones, Qualcomm is the manufacturer of the chips based on that architecture.

However, if BrainChip manages to run large language transformer models (LLMs), its use cases would no longer be limited to edge devices. It could extend to data centers, home computers, and beyond.

FJ-215 · May 5, 2025

Bravo said:
Hi FJ,

Maybe this is what Alf was alluding to in that Linkedin video when said something like “Our next step may or may not be to turn this into a chip?“. Or something elusive like that.

I would like to see us do it sooner rather than later. Could be done on the cheap (ish) if they do it as a multi project wafer like Gen 1

Flenton · May 5, 2025

Kingkong2015 said:
That’s different. These chips are designed for edge devices and will be manufactured into SoCs, which is why an FPGA demo is sufficient at this stage. The key value lies in the AI model it runs and its ultra-low power consumption. The AI models provided in the Akida Zoo—or customers’ own proprietary models—are what enable different functionalities

Do you think a regular computer can’t handle vision tasks on a CPU or GPU? Of course it can—too easily. But desktop CPU/GPU can't run on edge devices

But if BrainChip can manufacture, or at least demonstrate, the capability to run the full LLaMA 400B model on its design—even Elon Musk would place an order. It currently takes 5 A100 GPUs (costing around $30,000 each) to run the 400B model.

Hopefully, BrainChip’s CTO can quickly design some demos to validate this concept—perhaps by stacking multiple Akida 1000 chips together to prove its feasibility.

So your telling me this is kind of a big step in the way of next generation computing?

And I just realised it's been over 2 years since Akida 2 came out, is numbe 3 almost ready?

Kingkong2015 · May 5, 2025

Flenton said:
So your telling me this is kind of a big step in the way of next generation computing?

And I just realised it's been over 2 years since Akida 2 came out, is numbe 3 almost ready?

I can’t predict the future, but it’s certainly a much better alternative for running AI models compared to NVIDIA’s power-hungry, expensive, and bulky GPUs.

In addition, if brainchip is able to develop a successful proof of concept, there’s no doubt that massive investors will be eager to back it and accelerate its path to production.

I’m not a pumper and I’m not here to hype things up—I’ve done my research. I originally thought support for transformer-based LLMs was at least a decade away, but it looks like we’re getting it NOW. And that really excites me.

BrainShit · May 5, 2025

Kingkong2015 said:
You didn’t fully understand the implication. If Pico can run LLaMA 1B, it can easily scale up to run DeepSeek, Qwen, and the full LLaMA model — something only the NVIDIA 5090 can currently handle(priced between $5,599 and $7,199 AUD), and even that is out of stock. BrainChip could manufacture and sell it directly to the consumer market, bypassing those outdated 10-year IP deal constraints. The demand for running local AI models is tremendous, especially since ChatGPT tokens are becoming increasingly expensive.

And this will immediately position us as one of the biggest competitors of Nvidia.

Source: Grok

LLaMA-1B (a hypothetical 1 billion parameter model from the LLaMA family, as Meta AI's LLaMA models typically come in larger sizes like 7B, 13B, etc.) or a similarly sized model could potentially be run on BrainChip's Akida Pico, but it would require significant optimizations to fit within the chip's ultra-low-power and resource-constrained architecture. Here's a detailed breakdown:

### Feasibility of Running LLaMA-1B on Akida Pico

1. **Akida Pico's Constraints**:
- **Power and Memory**: The Akida Pico is designed for ultra-low power consumption (<1 milliwatt) and has limited on-chip SRAM for model weights and processing. It’s optimized for event-based, neuromorphic computing (spiking neural networks, SNNs) rather than traditional dense matrix operations used in transformer-based LLMs like LLaMA.
- **Compute**: The chip excels at lightweight tasks (e.g., voice wake detection, keyword spotting) and is not designed for the heavy floating-point computations required by large transformer models without optimization.

2. **LLaMA-1B Requirements**:
- **Model Size**: A 1B parameter model, assuming 16-bit (FP16) precision, requires approximately 2GB of memory for weights alone (1 parameter = 2 bytes in FP16). With 8-bit (INT8) quantization, this could be reduced to ~1GB, and 4-bit quantization could further shrink it to ~500MB. Additional memory is needed for activations, context, and intermediate computations, potentially pushing total memory needs to 1.5–3GB even with optimizations.
- **Inference Compute**: Inference for a 1B parameter transformer model involves billions of multiply-accumulate operations per token, which is computationally intensive for a low-power chip like the Akida Pico. Techniques like pruning or sparse activations could help, but the chip’s event-based architecture requires the model to be adapted to SNNs or similar formats.
- **Latency**: On resource-constrained edge devices, inference for a 1B model could take seconds per token without hardware acceleration tailored for transformers, making real-time applications challenging.

3. **Optimizations Required**:
To run a LLaMA-1B or similar 1B parameter model on the Akida Pico, the following optimizations would be critical:
- **Quantization**: Reducing precision to 8-bit or 4-bit (e.g., using techniques like post-training quantization or quantization-aware training) to shrink memory footprint and computational load. This is feasible, as models like LLaMA can maintain reasonable performance with low-bit quantization.
- **Model Pruning**: Removing redundant weights or layers to reduce the model size and computation, though this may degrade performance for general tasks.
- **Distillation**: Training a smaller, more efficient model to mimic LLaMA-1B’s behavior, potentially reducing the parameter count further (e.g., to 500M parameters) while retaining key capabilities.
- **SNN Conversion**: Converting the transformer model to a spiking neural network compatible with the Akida Pico’s neuromorphic architecture. BrainChip’s MetaTF framework supports converting traditional neural networks (e.g., CNNs) to SNNs, but adapting a transformer-based LLM like LLaMA would require significant research and engineering.
- **Task-Specific Fine-Tuning**: Limiting the model to a specific domain (e.g., voice commands, appliance control) to reduce complexity and memory needs. For example, fine-tuning LLaMA-1B on a dataset of appliance manuals could make it more suitable for the Akida Pico’s use case.

4. **Software Support**:
- BrainChip’s MetaTF framework allows developers to optimize and deploy models using standard AI workflows (TensorFlow, PyTorch). It can map neural networks to the Akida Pico’s event-based architecture, but transformer models like LLaMA require additional preprocessing to align with SNNs.
- The framework’s ability to handle on-chip learning could enable incremental fine-tuning on the device, reducing reliance on large pre-trained weights.

5. **Challenges**:
- **Memory Bottleneck**: Even with 4-bit quantization (~500MB for weights), the Akida Pico’s SRAM is likely far smaller than needed for a 1B parameter model. Off-chip memory access (e.g., via external flash) could help but would increase power consumption and latency, countering the chip’s low-power design.
- **Compute Mismatch**: Transformers rely on dense matrix operations, while the Akida Pico is optimized for sparse, event-driven computations. Converting LLaMA-1B to an SNN-compatible format without significant performance loss is a non-trivial research challenge.
- **Latency for Real-Time Use**: Even with optimizations, generating text with a 1B model on the Akida Pico could be slow (e.g., seconds per token), limiting its use for interactive applications like chatbots unless heavily tailored.

6. **Comparison to Smaller Models**:
- Smaller models, like Pythia-70M or DistilBERT (~100M parameters), are far more feasible for the Akida Pico. These require ~200MB (FP16) or ~50–100MB (4-bit) for weights, fitting more comfortably within the chip’s constraints. BrainChip has demonstrated running small, use-case-specific LLMs (e.g., for smart appliances), suggesting that a 1B model is at the upper limit of feasibility.
- A distilled version of LLaMA-1B (e.g., reduced to 500M parameters) would be more practical and align better with the chip’s capabilities.

### Practical Scenarios
- **Feasible Use Case**: A heavily quantized, fine-tuned, and SNN-converted LLaMA-1B model could run on the Akida Pico for a specific task, such as processing voice commands or answering queries about a device’s user manual. For example, a smart speaker could use the model to respond to simple queries locally, consuming minimal power.
- **Example Workflow**:
1. Start with LLaMA-1B or a similar 1B parameter model.
2. Apply 4-bit quantization and pruning to reduce the model size to ~500MB.
3. Fine-tune on a narrow dataset (e.g., appliance manuals).
4. Use MetaTF to convert the model to an SNN compatible with the Akida Pico.
5. Deploy for inference on the chip, leveraging on-chip learning for minor updates.

### Conclusion
Running a LLaMA-1B or similar 1 billion parameter model on BrainChip’s Akida Pico is theoretically possible but pushes the chip’s limits. It would require aggressive optimizations like 4-bit quantization, pruning, distillation, and conversion to a spiking neural network, along with task-specific fine-tuning to reduce memory and compute demands. Even then, inference may be slow, and the model’s general-purpose capabilities would likely be constrained to niche applications (e.g., voice assistants, appliance control). Smaller models (e.g., 70M–500M parameters) are far more practical for the Akida Pico’s ultra-low-power, neuromorphic design.

Source: Grok

The Pope · May 6, 2025

Hi all,

While I can’t attend the BRN AGM tomorrow Tony Dawe responded to my email request for a BRN patent update back in late April 25

Tony’s comments below from a couple of days ago.

“I have sought a response from our CTO Dr Tony Lewis to your questions regarding patents, however I haven’t received a response yet. He has been attending conferences in and outside the US lately.

I will endeavour to clarify the number of patents that have been awarded for you, however I believe an update will be made by our CEO at the AGM”
—————————-

If a detailed BRN patent update is not provided by Sean’s in his opening address to shareholders tomorrow or by Tony Lewis then I suggest to others on this forum to ask key questions on current status updates with patents. I know if Sean doesn’t then I will be including my questions provided to Tony and requesting responses at the AGM tomorrow.

For example, a couple of my questions were

1) Has there been any issues with any BRN submitted patents not being approved by any patent offices and if so what were the key overarching reasons?

2) Has there been any new BRN patents submitted or approved by patent offices around the world linked to PICO, Akida 2.0 or TENNS (not limited to)

Kingkong2015 · May 6, 2025

BrainShit said:
Source: Grok

LLaMA-1B (a hypothetical 1 billion parameter model from the LLaMA family, as Meta AI's LLaMA models typically come in larger sizes like 7B, 13B, etc.) or a similarly sized model could potentially be run on BrainChip's Akida Pico, but it would require significant optimizations to fit within the chip's ultra-low-power and resource-constrained architecture. Here's a detailed breakdown:

### Feasibility of Running LLaMA-1B on Akida Pico

1. **Akida Pico's Constraints**:
- **Power and Memory**: The Akida Pico is designed for ultra-low power consumption (<1 milliwatt) and has limited on-chip SRAM for model weights and processing. It’s optimized for event-based, neuromorphic computing (spiking neural networks, SNNs) rather than traditional dense matrix operations used in transformer-based LLMs like LLaMA.
- **Compute**: The chip excels at lightweight tasks (e.g., voice wake detection, keyword spotting) and is not designed for the heavy floating-point computations required by large transformer models without optimization.

2. **LLaMA-1B Requirements**:
- **Model Size**: A 1B parameter model, assuming 16-bit (FP16) precision, requires approximately 2GB of memory for weights alone (1 parameter = 2 bytes in FP16). With 8-bit (INT8) quantization, this could be reduced to ~1GB, and 4-bit quantization could further shrink it to ~500MB. Additional memory is needed for activations, context, and intermediate computations, potentially pushing total memory needs to 1.5–3GB even with optimizations.
- **Inference Compute**: Inference for a 1B parameter transformer model involves billions of multiply-accumulate operations per token, which is computationally intensive for a low-power chip like the Akida Pico. Techniques like pruning or sparse activations could help, but the chip’s event-based architecture requires the model to be adapted to SNNs or similar formats.
- **Latency**: On resource-constrained edge devices, inference for a 1B model could take seconds per token without hardware acceleration tailored for transformers, making real-time applications challenging.

3. **Optimizations Required**:
To run a LLaMA-1B or similar 1B parameter model on the Akida Pico, the following optimizations would be critical:
- **Quantization**: Reducing precision to 8-bit or 4-bit (e.g., using techniques like post-training quantization or quantization-aware training) to shrink memory footprint and computational load. This is feasible, as models like LLaMA can maintain reasonable performance with low-bit quantization.
- **Model Pruning**: Removing redundant weights or layers to reduce the model size and computation, though this may degrade performance for general tasks.
- **Distillation**: Training a smaller, more efficient model to mimic LLaMA-1B’s behavior, potentially reducing the parameter count further (e.g., to 500M parameters) while retaining key capabilities.
- **SNN Conversion**: Converting the transformer model to a spiking neural network compatible with the Akida Pico’s neuromorphic architecture. BrainChip’s MetaTF framework supports converting traditional neural networks (e.g., CNNs) to SNNs, but adapting a transformer-based LLM like LLaMA would require significant research and engineering.
- **Task-Specific Fine-Tuning**: Limiting the model to a specific domain (e.g., voice commands, appliance control) to reduce complexity and memory needs. For example, fine-tuning LLaMA-1B on a dataset of appliance manuals could make it more suitable for the Akida Pico’s use case.

4. **Software Support**:
- BrainChip’s MetaTF framework allows developers to optimize and deploy models using standard AI workflows (TensorFlow, PyTorch). It can map neural networks to the Akida Pico’s event-based architecture, but transformer models like LLaMA require additional preprocessing to align with SNNs.
- The framework’s ability to handle on-chip learning could enable incremental fine-tuning on the device, reducing reliance on large pre-trained weights.

5. **Challenges**:
- **Memory Bottleneck**: Even with 4-bit quantization (~500MB for weights), the Akida Pico’s SRAM is likely far smaller than needed for a 1B parameter model. Off-chip memory access (e.g., via external flash) could help but would increase power consumption and latency, countering the chip’s low-power design.
- **Compute Mismatch**: Transformers rely on dense matrix operations, while the Akida Pico is optimized for sparse, event-driven computations. Converting LLaMA-1B to an SNN-compatible format without significant performance loss is a non-trivial research challenge.
- **Latency for Real-Time Use**: Even with optimizations, generating text with a 1B model on the Akida Pico could be slow (e.g., seconds per token), limiting its use for interactive applications like chatbots unless heavily tailored.

6. **Comparison to Smaller Models**:
- Smaller models, like Pythia-70M or DistilBERT (~100M parameters), are far more feasible for the Akida Pico. These require ~200MB (FP16) or ~50–100MB (4-bit) for weights, fitting more comfortably within the chip’s constraints. BrainChip has demonstrated running small, use-case-specific LLMs (e.g., for smart appliances), suggesting that a 1B model is at the upper limit of feasibility.
- A distilled version of LLaMA-1B (e.g., reduced to 500M parameters) would be more practical and align better with the chip’s capabilities.

### Practical Scenarios
- **Feasible Use Case**: A heavily quantized, fine-tuned, and SNN-converted LLaMA-1B model could run on the Akida Pico for a specific task, such as processing voice commands or answering queries about a device’s user manual. For example, a smart speaker could use the model to respond to simple queries locally, consuming minimal power.
- **Example Workflow**:
1. Start with LLaMA-1B or a similar 1B parameter model.
2. Apply 4-bit quantization and pruning to reduce the model size to ~500MB.
3. Fine-tune on a narrow dataset (e.g., appliance manuals).
4. Use MetaTF to convert the model to an SNN compatible with the Akida Pico.
5. Deploy for inference on the chip, leveraging on-chip learning for minor updates.

### Conclusion
Running a LLaMA-1B or similar 1 billion parameter model on BrainChip’s Akida Pico is theoretically possible but pushes the chip’s limits. It would require aggressive optimizations like 4-bit quantization, pruning, distillation, and conversion to a spiking neural network, along with task-specific fine-tuning to reduce memory and compute demands. Even then, inference may be slow, and the model’s general-purpose capabilities would likely be constrained to niche applications (e.g., voice assistants, appliance control). Smaller models (e.g., 70M–500M parameters) are far more practical for the Akida Pico’s ultra-low-power, neuromorphic design.

Source: Grok

wrong.
transformer model -> state space model -> akida.

BRN Discussion Ongoing

Regular

Regular

Regular

Top 20

Regular

Top 20

Top 20

Regular

Top 20

If ARM was an arm, BRN would be its biceps💪!

Regular

Regular

Regular

Regular

Regular

Regular

Regular

Regular

Regular

Regular

Attachments

Similar threads