BRN Discussion Ongoing

IloveLamp

Top 20
🤔 Can somebody with some technical know how check this out?

Seems like a few nuggets to be had


Screenshot_20240118_064412_LinkedIn.jpg
 
  • Like
  • Thinking
  • Wow
Reactions: 9 users

TECH

Regular
  • Like
  • Fire
  • Haha
Reactions: 39 users

AARONASX

Holding onto what I've got
 
  • Like
  • Fire
  • Love
Reactions: 44 users

Damo4

Regular


As if coffee is the one of the first kitchen appliance to get Ai.
Don't know why, but I thought we'd get Fridge based Ai - Food stock levels and expiration monitoring
 
  • Like
  • Haha
Reactions: 9 users

cosors

👀
🤔 Can somebody with some technical know how check this out?

Seems like a few nuggets to be had


View attachment 54500
"Four years after we've launched the initial product (AKD1000, methinks), nobody in our industry has come close to matching us for our performance or for the small form factor that we've delivered with your product."
Todd Goodnight
 
Last edited:
  • Like
  • Fire
  • Thinking
Reactions: 20 users

IloveLamp

Top 20
"Four years after we've launched the initial product (AKD1000, methinks), nobody in our industry has come close to matching us for our performance or for the small form factor that we've delivered with your product."
Todd Goodnight
Obviously an old product i guess. This is the one i was wondering about as it's new and power consumption is 6mw - 9mw

Screenshot_20240118_083647_Chrome.jpg
 
  • Like
  • Wow
  • Thinking
Reactions: 11 users

cosors

👀
The thing that some doubters misunderstand about the two video-cassette-type-thing - there are no two competing things here. How well thought out his words were
 
  • Like
  • Fire
  • Love
Reactions: 16 users

We have produced a virtual assistant demo that utilizes Meta’s LLAMA2-7B LLM on mobile via a chat-based The generative AI workloads take place entirely at the edge on the mobile device on the Arm CPUs,​


Link has a demonstration

Generative AI is on Mobile and it’s Powered by Arm​

Exciting new developments that demonstrate the advanced AI capabilities of the Arm CPU.
By James McNiven, Vice President of Product Management, Client Line of Business, Arm
Artificial Intelligence (AI)Smartphones
Share
Arm-NN-blog-post-image-1400x788.jpg

Generative AI, which includes today’s well-known, highly publicized large language models (LLMs), has arrived at the edge on mobile. This means that AI generative inferences, from generating images and videos to understanding words in context, are starting to be processed entirely on the mobile device, rather than being sent to the Cloud and back.
Arm is the foundational technology to enable AI to run everywhere and when it comes to generative AI on mobile, there are some exciting, new developments that demonstrate this in action, from the latest AI-enabled flagship smartphones to LLMs being directly processed on the Arm CPU.

New AI-powered smartphones​

High performance AI-enabled smartphones are now on the market, which are built on Arm’s v9 CPU and GPU technologies. These include the new MediaTek Dimensity 9300-powered vivo X100 and X100 Pro smartphones, Samsung Galaxy S24, and the Google Pixel 8.
The combination of performance and efficiency provided by these flagship mobile devices are delivering unprecedented opportunities for AI innovation. In fact, Arm’s own CPU and GPU performance improvements have doubled AI processing capabilities every two years during the past decade.
This trend will only advance in the future with more AI performance, technologies, and features on our robust consumer technology roadmap. This will be supported by the rise of AI inference at the edge, the process of using a trained model like LLMs to power AI-based applications, with CPUs being best placed to serve this need as more AI support and specialized instructions continue to be added.

It all starts on the CPU….​

In most cases, the use of AI on our favorite mobile devices starts on the CPU, with some good examples being face, hand and body tracking, advanced camera effects and filters, and segmentation across the many social applications. The CPU will handle such AI workloads in their entirety or be supported by accelerators, including GPUs or NPUs. Arm technology is crucial to enabling these AI workloads, as our CPU designs are pervasive across the SoCs in today’s smartphones used by billions of people worldwide.
This has led to 70 percent of AI in today’s third-party applications running on Arm CPUs, including the latest social, health and camera-based applications and many more. Alongside the pervasiveness of the designs, the flexibility and AI capabilities of the Arm CPU makes it the best technology for mobile developers to target for their applications’ AI workloads.
In terms of flexibility, Arm CPUs can run a wide variety of neural networks in many different data formats. Looking ahead, future Arm CPUs will include more AI capabilities in the instruction set for the benefit of Arm’s industry-leading ecosystem, like the Scalable Matrix Extension (SME) for the Armv9-A architecture. These help the world’s developers deliver improved performance, innovative features and scalability for their AI-based applications.
The combination of leading hardware and software ecosystem support means Arm has a performant compute platform that is enabling the rise of generative AI at the edge, which could include gaming advancements, image enhancements, language translation, text generation and virtual assistants. We will be demonstrating some examples of these next-gen AI workloads and more at Mobile World Congress 2024.

LLM on mobile on the Arm compute platform​

We have produced a virtual assistant demo that utilizes Meta’s LLAMA2-7B LLM on mobile via a chat-based application. The generative AI workloads take place entirely at the edge on the mobile device on the Arm CPUs, with no involvement from accelerators. The impressive performance is enabled through a combination of existing CPU instructions for AI, alongside dedicated software optimizations for LLMs through the ubiquitous Arm compute platform that includes the Arm AI software libraries.


As you can see from the video above, there is a very impressive time-to-first token response performance and a text generation rate of just under 10 tokens per second that is faster than the average human reading speed. This is made possible by highly optimized CPU routines in the software library developed by the Arm engineering team that improves time-to-first token by 50 percent and text generation by 20 percent, compared to the native implementation in the LLAMA2-7B LLM.

The Arm CPU also provides the AI developer community with opportunities to experiment with their own techniques to provide further software optimizations that make LLMs smaller, more efficient and faster.

Enabling more efficient, smaller LLMs means more AI processing can take place at the edge. The user benefits from quicker, more responsive AI-based experiences, as well as greater privacy through user data being processed locally on the mobile device. Meanwhile, for the mobile ecosystem, there are lower costs and greater scalability options to enable AI deployment across billions of mobile devices.

Find out more information about this demo from the Arm engineers that developed it in this technical blog.

Driving generative AI on mobile​

As the most ubiquitous mobile compute platform and leader in efficient compute, Arm has a responsibility to enable the most efficient and highest-performing generative AI at the edge. We are already demonstrating the impressive performance of LLMs that are running entirely on our leading CPU technologies. However, this is just the start.

Through a combination of smaller, more efficient LLMs, improved performance on mobile devices built on Arm CPUs and innovative software optimizations from our industry-leading ecosystem, generative AI on mobile will continue to proliferate.

Arm is foundational to AI and we will enable AI everywhere, for every developer, with the Arm CPU at the heart of future generative AI innovation on mobile.



View attachment 54497

🚀📢 Exciting news. Generative AI has arrived at the edge on mobile!

Now, AI generative inferences can be processed entirely on your mobile device running #onArm CPUs.

We're excited to share some of the latest developments in action. Here's a glimpse:
📱 Elevate your mobile experience with AI-powered smartphones, boasting unparalleled performance powered by our Armv9 CPU.

🔧 Experience efficiency like never before with software optimizations, making Large Language Models (LLMs) smaller and faster

➕ Expect enhanced AI capabilities in our CPU instruction sets – (more details coming soon)

As the most ubiquitous mobile compute platform and leader in efficient compute, expect to see Arm CPUs at the heart of future generative AI innovation on mobile. See why in our latest blog: https://bit.ly/47EEqqs

Stay tuned.
"Expect enhanced AI capabilities in our CPU instruction sets – (more details coming soon)"

Umm, define "soon" please..😛
 
  • Like
Reactions: 9 users
  • Like
  • Fire
  • Love
Reactions: 35 users

Damo4

Regular

See if you notice something in the Differentiators....(y)

Tech.

Sport highlights and brawl highlights - wow
I know they are probably trying to remove violence form the highlights, but imagine how easy it will be to create "best football fights of 2024"
  • 80% Automation of Highlights & Violence Detection workflow is 80% more effective compared to the manual process

BTW has anyone got the guts to categorically say this isn't Akida?
How else could they do it?

1705534441254.png




Zvs5gWrCUHPqgAz-7uMrRyS3j2Y=.gif
 
  • Like
  • Love
  • Fire
Reactions: 21 users

buena suerte :-)

BOB Bank of Brainchip
Well we can only dream and hope 🙏🙏🙏 One day!!!!


1705534502001.png



1705534139583.png

1705534201135.png

1705534251059.png

1705534308383.png
 
Last edited:
  • Like
  • Love
Reactions: 17 users

Damo4

Regular


Small box, self contained and thermally efficient - power and size is always a factor.
Very scalable too, excited about huge growth in 2024.
 
Last edited:
  • Like
  • Fire
  • Love
Reactions: 40 users

skutza

Regular
  • Like
  • Fire
  • Love
Reactions: 9 users

Damo4

Regular
Hi DB,

I'm hanging my hat on TeNNs which makes Akida 2 significantly more efficient than Akida 1, which itself is significantly more efficient than anything else MB has tried.

... and we have just found the published patent applications for TeNNs, so good luck to anyone trying to reproduce that.

Hi Dio and all,

Just thought I'd re-post the wonderful video from Brainchip.
I know it's now "old news" but such a good explanation of why TeNNs is so important.
It also highlights how far ahead Brainchip is.

 
  • Like
  • Love
  • Fire
Reactions: 23 users

Bravo

If ARM was an arm, BRN would be its biceps💪!
Look! Tata Exlsi and Unity are BFF's.

And... Unity is providing the development platform and the runtime architecture that will power the infotainment domain of the MB.OS operating system, so there is no way IMO that they can't already be familiar with us.

🥳🥳🥳

Screenshot 2024-01-18 at 11.19.56 am.png




Screenshot 2024-01-18 at 11.23.03 am.png
 
Last edited:
  • Like
  • Fire
  • Love
Reactions: 26 users

Diogenese

Top 20
Here's a discussion from ARM about LLMs at the edge. Note they endorse the 4-bits is enuf concept and anticipate further improvements from the ARM ecosystem:

https://community.arm.com/arm-commu...blog/posts/generative-ai-on-mobile-on-arm-cpu

Gian Marco Iodice
January 17, 2024


8 minute read time.

2023 was the year that showcased an impressive number of use cases powered by generative AI. This disruptive form of artificial intelligence (AI) technology is at the heart OpenAI’s ChatGPT and Google’s Gemini AI model, with it demonstrating the opportunity to simplify work and advance education through generating text, images, or even audio content from user text prompts. Sounds impressive, doesn't it?
However, what’s the next step for generative AI as it proliferates across our favorite consumer devices? The answer is generative AI at the edge on mobile.
In this blog, we will demonstrate how Large Language Models (LLMs), a form of generative AI inference, can run on the majority of mobile devices built on Arm technology. We will discuss how the Arm CPU is well suited for this type of use case due the to the typical batch size and balance of compute and bandwidth that is required for this type of AI workload. We will also explain the AI capabilities of the Arm CPU and demonstrate how its flexibility and programmability enables clever software optimizations. This is resulting in great performance and opportunities for many LLM use cases.

Introduction to LLMs

There are a wide variety of different network architectures that can be used for generative AI. However, LLMs are certainly attracting a lot of interest due to their ability to interpret and generate text on a scale that has never been seen before.
As the LLM name suggests, these models are anything but small compared to what we were using up until last year. To give some numbers, they can easily have between 100 billion and 1 trillion trainable parameters. This means they are at least three orders of magnitude larger compared to BERT (Bidirectional Encoder Representations from Transformers), one of the largest state-of-the-art NLP (Natural Language Processing) models trained by Google in 2018.
But how does a 100 billion parameter model translate into RAM use? If we considered deploying the model on a processor using floating-point 16-bit acceleration, a 100B parameter model would require at least 200GB of RAM!
As a result, these large models end up running on the Cloud. However, this poses three fundamental challenges that could limit the adoption of this technology:

  • High infrastructure costs;
  • Privacy issues (due to the potential exposure of user data), and;
  • Scalability challenges.
Towards the second half of 2023, we started to see some smaller, more efficient LLMs emerge that will unlock generative AI on mobile, making this technology more pervasive.
In 2023, LLaMA2 from Meta, Gemini Nano from Google and Phi-2 from Microsoft opened the door to mobile LLM deployment to solve the three challenges previously listed. In fact, these models have 7 billion, 3.25 billion, and 2.7 billion trainable parameters, respectively.

Running LLMs on the mobile CPU

Today’s mobile devices have incredible computational power built on Arm technology that makes them capable of running complex AI algorithms in real-time. In fact, existing flagship and premium smartphones can already run LLMs. Yes, you read it correctly.
The deployment of LLMs on mobile is predicted to accelerate in the future, with the following likely use cases:

  • Text generation: For example, we might ask our virtual assistant to write an email for us.
  • Smart reply: Our instant messaging application might propose replies to questions automatically.
  • Text summarization: Our eBook reader application might provide a summary of a chapter.
Across all these use cases, there will be vast amounts of user data that the model will need to process. However, the fact the LLM runs at the edge without an internet connection means the data does not leave the device. This helps to protect the privacy of individuals, as well as improving the latency and responsiveness of the user experience. These are certainly compelling reasons for deploying LLM at the edge on mobile.
Fortunately, almost all smartphones worldwide (around 99 percent) have the technology that is already capable of processing LLMs at the edge today: the Arm CPU.
This is demonstrated through a brand-new Arm demo that can be found below.
...

The int4 bit quantization

Quantization is the crucial technique to make any AI and machine learning (ML) models compact enough to run efficiently on devices with limited RAM. Therefore, this technique is indispensable for LLMs, with billions of trainable parameters natively stored in floating-point data types, such as floating-point 32-bit (FP32) and floating-point 16-bit (FP16). For example, the LLaMA2-7B variant with FP16 weights needs at least ~14 GB of RAM, which is prohibitive in many mobile devices.

By quantizing an FP16 model to 4-bit, we can reduce its size by four times and bring the RAM use to roughly 4GB. Since the Arm CPU offers tremendous software flexibility, developers can also lower the number of bits per parameter value to obtain a smaller model. However, keep in mind that lowering the number of bits to three or two bits might lead to a significant accuracy loss.
...
Using dedicated AI instructions, CPU thread affinity and software optimized routines, the virtual assistant demo showcases a great overall user experience for interactive use cases. The video demonstrates the immediate time-to-first token response, and a text generation rate that is faster than the average human reading speed. Best of all, this performance is achievable on all Cortex-A700 enabled mobile devices.

However, this is just the beginning of the LLM experience on Arm technology. As LLMs get smaller and more sophisticated, their performance on mobile devices at the edge will continue to improve. In addition, Arm and partners from our industry-leading ecosystem will continue to add new hardware advancements and software optimizations to accelerate the AI capabilities of the CPU Instruction Set, like the Scalable Matrix Extension (SME) for the Armv9-A architecture. These advancements will unlock the next era of use cases for LLMs on Arm-based consumer devices throughout 2024 and beyond
.
 
  • Like
  • Fire
  • Love
Reactions: 46 users


Small box, self contained and thermally efficient - power and size is always a factor.
Very scalable too, excited about huge growth in 2024.

Hi All
If you have listened to this podcast it is important to remember that VVDN market an Edge Box from Nvidia and in a couple of words this VVDN official dismisses every other Edge Box on the market as being not as capable as the VVDN AKIDA Edge Box.

If you have not listened do yourself a favour and spend the eight minutes it takes.

My opinion only DYOR
Fact Finder
 
Last edited:
  • Like
  • Fire
  • Love
Reactions: 67 users

Diogenese

Top 20
Hi Dio and all,

Just thought I'd re-post the wonderful video from Brainchip.
I know it's now "old news" but such a good explanation of why TeNNs is so important.
It also highlights how far ahead Brainchip is.


Thanks Damo,

I think the patents for TeNNs have increased the value of BRN's patent portfolio by an order of magnitude.

WO2023250092A1 METHOD AND SYSTEM FOR PROCESSING EVENT-BASED DATA IN EVENT-BASED SPATIOTEMPORAL NEURAL NETWORKS 20220622

WO2023250093A1 METHOD AND SYSTEM FOR IMPLEMENTING TEMPORAL CONVOLUTION IN SPATIOTEMPORAL NEURAL NETWORK 20220622
...
[0005] Currently, most of the accessible data is available in spatiotemporal formats. To use the spatiotemporal forms of data effectively in machine learning applications, it is essential to design a lightweight network that can efficiently learn spatial and temporal features and correlations from data. At present, the convolutional neural network (CNN) is considered the prevailing standard for spatial networks, while the recurrent neural network (RNN) equipped with nonlinear gating mechanisms, such as long short-term memory (LSTM) and gated recurrent unit (GRU), is being preferred for temporal networks.

[0006] The CNNs are capable of learning crucial spatial correlations or features in spatial data, such as images or video frames, and gradually abstracting the learned spatial correlations or features into more complex features as the spatial data is processed layer by layer. These CNNs have become the predominant choice for image classification and related tasks over the past decade. This is primarily due to the efficiency in extracting spatial correlations from static input images and mapping them into their appropriate classifications with the fundamental engines of deep learning like gradient descent and backpropagation paring up together. This results in state-of-the-art accuracy for the CNNs. However, many modem Machine Learning (ML) workflows increasingly utilize data that come in spatiotemporal forms, such as natural language processing (NLP) and object detection from video streams. The CNN models used for image classification lack the power to effectively use temporal data present in these application inputs. Importantly, CNNs fail to provide flexibility to encode and process temporal data efficiently. Thus, there is a need to provide flexibility to artificial neurons to encode and process temporal data efficiently.

[0007] Recently different methods to incorporate temporal or sequential data, including temporal convolution and internal state approaches have been explored. When temporal processing is a requirement, for example in NLP or sequence prediction problems, the RNNs such as long short-term memory (LSTM) and gated recurrent memory (GRU) models are utilized. Further, according to another conventional method, a 2D spatial convolution combined with state-based RNNs such as LSTMs or GRUs to process temporal information components using models such as ConvLSTM have been used. However, each of these conventional approaches comes with significant drawbacks. For example, while combining 2D spatial convolutions with ID temporal convolutions requires large amount of parameters due to temporal dimension and is thus not appropriate for efficient low-power inference.

[0008] One of the main challenges with the RNNs is the involvement of excessive nonlinear operations at each time step, that leads to two significant drawbacks. Firstly, these nonlinearities force the network to be sequential in time i.e., making the RNNs difficult for efficiently leveraging parallel processing during training. Secondly, since the applied nonlinearities are ad-hoc in nature and lack a theoretical guarantee of stability, it is challenging to train the RNNs or perform inference over long sequences of time series data. These limitations also apply to models, for example, ConvLSTM models as discussed in the above paragraphs, that combine 2D spatial convolution with RNNs to process the sequential and temporal data.
...
[0012] According to an embodiment of the present disclosure, disclosed herein is a neural network system that includes an input interface, a memory including a plurality of temporal and spatial layers, and a processor. The input interface is configured to receive sequential data that includes temporal data sequences. The memory is configured to store a plurality of group of first temporal kernel values, a first plurality of First-In-FirstOut (FIFO) buffers corresponding to a current temporal layer. The memory further implements a neural network that includes a first plurality of neurons for the current temporal layer, a corresponding group among the plurality of groups of the first temporal kernel values is associated with each connection of a corresponding neuron of the first plurality of neurons. The processor is configured to allocate the first plurality of FIFO buffers to a first group of neurons among the first plurality of neurons. The processor is then configured to receive a first temporal sequence of the corresponding temporal data sequences into the first plurality of FIFO buffers allocated to the first group of neurons from corresponding temporal data sequences over a first time window. Thereafter, the processor is configured to perform, for each connection of a corresponding neuron of the first group of neurons, a first dot product of the first temporal sequence of the corresponding temporal data sequences within a corresponding FIFO buffer of first plurality of FIFO buffers with a corresponding temporal kernel value among the corresponding group of the first temporal kernel values. The corresponding temporal kernel values are associated with a corresponding connection of the corresponding neuron of the first group of neurons. The processor is then further configured to determine a corresponding potential value for the corresponding neurons of the first group of neurons based on the performed first dot product and then generates a first output response based on the determined corresponding potential values. ...
 
  • Like
  • Fire
  • Love
Reactions: 42 users
Top Bottom