Neuromorphia
fact collector
Context..Video about Renesas, Brainchips Customer.
Last edited:
Funny bring up Apple.Following on from this, is it safe to assume that Apple are using us in their macs and perhaps tablets?
An article I was reading makes me believe the possibility.
ARM vs Intel Processors: What’s the Difference?
Originally published Oct. 16, 2021, by Darien Graham-SmithUpdated Jan. 16, 2022, by Steve Larner When choosing a smartphone or tablet, you'll notice thatwww.alphr.com
The last paragraph from that article:
View attachment 23089
From the article that @Pmel just published:
View attachment 23090
Is my dot joining close to the money? Please let it be so.
This may help explain why AMX instructions are not described in official documentation. ARM Ltd. expects Apple to keep these kinds of instructions inside libraries provided by the customer (Apple in this case).The new instructions are interleaved with standard Arm instructions. To avoid software fragmentation and maintain a coherent software development environment, Arm expects customers to use the custom instructions mostly in called library functions.
Akida in action with Renesas
I'm not 100 percent convinced on this. Love to be wrong though.I have posted the video for this RZ/V2 previous page. Renesas has uploaded to YouTube
Funny bring up Apple.
Found this yesterday but hadn't posted yet.
Erik Engheim
Follow
Jan 15, 2021
·
9 min read
The Secret Apple M1 Coprocessor
Developer Dougall Johnson has through reverse engineering, uncovered a secret powerful coprocessor dubbed AMX: Apple Matrix coprocessor inside the M1 chip.
Stories about the Apple Matrix coprocessor (AMX) are already out there. But not exactly discussed in a beginner friendly manner. And that is what I try to do here. Bring you the story buried under thick layers of technical jargon without treating you like an idiot.
To tell this story we need to clarify the basics such as what is a coprocessor? What is a matrix? And why should you even care about any of this?
More importantly why do none of the Apple slides talk about this coprocessor? Why is it seemingly a secret? If you have read about the Neural Engine inside the M1 System-on-a-Chip (SoC) you may be confused about what makes Apple’s Matrix coprocessor (AMX) is different.
Before we get to the big question, let me start with the basic concepts such as what a matrix and a coprocessor is.
What is a Matrix Anyway?
A matrix is basically just a table of numbers. If you have worked with spreadsheets such as Microsoft Excel, you have basically worked with something very similar to matricies. The key difference is that in math such tables of numbers have a laundry list of operations they support and specific behavior. A matrix can come in different flavors as you see here. A matrix with such a row, is usually called a row vector. If one a column, we call it a column vector.
We can add, subtract, scale and multiple matrices. Addition is pretty easy. You just add every element separately. Multiplication is a bit more involved. I am just showing the simple case here.
More in depth: Why Does Matrix Multiplication Work the Way it Does?
Using matrices to rotate and scale: Explaining Affine Rotation (this is pretty math geeky).
Why Do We Care About Matrices?
The reason matrices are important is because they are heavily used in:
In particular machine learning which has been hot these last years. Just adding more cores to the CPU will not make this run fast enough as it is really demanding. You really need specialized hardware. Regular tasks such browsing the internet, writing email, word processing and spreadsheets has been running fast enough for years. It is for specialized tasks which we really need to boost the processing power.
- Image processing
- Machine learning
- Speech and handwriting recognition
- Face recognition
- Compression
- Multimedia: audio and video
You could spend your silicon real-estate (transistors) on more CPU cores or by adding specialized hardware.
On any given chip, Apple has a max number of transistors to spend building different kinds of hardware. They could add more CPU cores but that really just speeds up regular tasks, which already run fast enough. Thus they have chosen to spend transistors to make specialized hardware to tackle image processing, video decoding and machine learning. This specialized hardware is the coprocessor and accelerators.
More talk about coprocessors and accelerators: Apple M1 foreshadows Rise of RISC-V.
How is Apple’s Matrix Coprocessor Different From the Neural Engine?
If you have read about the Neural Engine, you will know that it also does matrix operations to help with machine learning tasks. So why do we need the Matrix coprocessor? Or are they actually just the same thing? Am I just confused? No, let me clarify how Apple’s Matrix Coprocessor differ from the Neural Engine and why we need both.
The main processor (CPU), coprocessors and accelerators can usually exchange data over a shared data bus. The CPU usually controls memory access while an Accelerator such as a GPU often has its own dedicated memory.
I admit that in past stories I often use the term coprocessor and accelerator interchangeably but they are not the same. A GPU as found in your Nvidia graphics card and the Neural Engine are both a type of accelerator.
In both cases you have special areas of memory which the CPU has to fill up with data it wants processed as well as another part of memory which it fills up with a list of instructions that accelerator should perform. It is time consuming for a CPU to setup this kind of processing. There is a lot of coordination, filling in data, and then waiting to get results back.
Thus this only pays off for larger tasks. For smaller tasks the overhead will be too high.
Coprocessors unlike accelerator spy on the stream of instructions read from memory into the main processor. Accelerators in contrast don’t observe the instructions the CPU is pulling from memory.
This is where coprocessors are a benefit over accelerators.
Edit November 2nd 2021: More recent info on the AMX suggest the description I am giving below is not correct. There is no AMX per core, so it cannot spy on the instruction stream to a core. However the advantage relative to the Neural Engine is similar to what I describe. AMX access memory more like a CPU than a GPU or Neural Engine which are optimized for processing large but slow batches of data.
Coprocessors sit and spy on the stream of machine code instructions being fed from memory (or cache more specifically) into the CPU. Coprocessor are made to react to particular instructions they were made to process. The CPU meanwhile has been made to mostly ignore these instructions or help facilitate the handling of them by a coprocessor.
What we gain from this is that instructions carried out by the coprocessor can be placed inside your regular code. This is different from say a GPU. If you have done GPU programming you know that shader programs are placed into separate buffers of memory, and you have to explicitly transport these shader programs to the GPU. You cannot place GPU specific instruction inside your regular code. Thus for smaller workloads involving matrix processing AMX will be better than the Neural Engine.
What is the catch? You need to actually define the instructions in the instruction-set architecture (ISA) of your microprocessor. Thus you need much tighter integration with the CPU when using a coprocessor than when using an accelerator.
ARM Ltd. creators of the ARM instruction-set architecture (ISA) has long resisted adding custom instructions to their ISA. This is one of the advantages of RISC-V: What Is Innovative About RISC-V?
However due to pressure from customers ARM relented and announced in 2019 that they would allow extensions. EE Times reports:
This may help explain why AMX instructions are not described in official documentation. ARM Ltd. expects Apple to keep these kinds of instructions inside libraries provided by the customer (Apple in this case).
How is a Matrix Coprocessor Different From a SIMD Vector Engine?
It is easy to confuse something like a matrix coprocessor with a SIMD vector engine, which you find inside most modern processors today including ARM processors. SIMD stands for Single Instruction Multiple Data.
Single Instruction Single Data (SISD) vs Single Instruction Multiple Data (SIMD)
SIMD is a way of getting higher performance when you need to perform the same operation on multiple elements. This is closely related to matrix operations. In fact SIMD instructions such as ARM’s Neon instructions or Intel x86 SSE or AVX are often used to speed up matrix multiplications.
Read more: RISC-V Vector Instructions vs ARM and x86 SIMD.
However a SIMD vector engine is part of a microprocessor core. Just like the ALU (Arithmetic Logic Unit) and FPU (Floating Point Unit) is part of the CPU. Inside the microprocessor there is an instruction decoder which will pick apart an instruction and decide what functional unit to activate (gray boxes).
Inside a CPU you got the ALU, FPU as well as SIMD vector engines (not shown) as separate parts activated by the instruction decoder. A coprocessor is external.
A coprocessor in contrast is external to a microprocessor core. In fact one of the early ones, Intel’s 8087 was a physically separate chip designed to speed up floating point calculations.
Intel 8087. One of the early coprocessors used for performing floating point calculations.
Now you may wonder why anyone would want to complicate CPU design by having a separate chip like this which has to sniff on the data flowing from memory to the CPU, to see if anything is a floating point instruction.
The reason was simple, the original 8086 CPU in the first PCs contained 29,000 transistors. The 8087 in contrast was far more complex at 45,000 transistors. It was really hard to make anything with that many transistors. Combining these two chips into one would have been really hard and expensive.
But as manufacturing technology improved, it was not a problem to put floating point units (FPUs) inside the CPU. Thus FPUs replaced the floating point coprocessors.
Why the AMX is not simply a part of the Firestorm cores on the M1 is not clear to me. They are all on the same silicon die anyway. I can only offer some speculations. By being a coprocessor, it may be easier for the CPU to continue running in parallel. Apple may also have liked to keep non-standard ARM stuff outside of their ARM CPU cores.
Why Is the AMX a Secret?
If AMX is not described in official documentation, how do we even know about it? Thanks to developer Dougall Johnson, who has done an amazing job reverse engineering the M1 to discover this coprocessor. His efforts are described here. For matrix related math operations Apple has special libraries or frameworks such as Accelerate, which is made up of:
Dougall Johnson knew these libraries would use the AMX coprocessor to speed up their calculations. Thus he wrote special programs to analyze and observe what these programs did to discover the special undocumented AMX machine code instructions.
- vImage — higher level image processing, such as converting between formats, image manipulation.
- BLAS — a sort of industry standard for linear algebra (what we call the math dealing with matricies and vectors).
- BNNS — is used for running neural networks and training.
- vDSP — digital signal processing. Fourier transformations, convolution. These are mathematical operations important in image processing or any signal really including audio.
- LAPACK — higher level linear algebra functions, e.g. for solving linear equations.
But why doesn’t Apple document this and let us use these instructions directly? As mentioned earlier, this is something ARM Ltd. would like to avoid. If custom instructions are widely used it could fragment the ARM ecosystem.
However more importantly, this is an advantage to Apple. By only letting their libraries use these special instructions Apple retains the freedom to radically change how this hardware works later. They could remove or add AMX instructions. Or they could let the Neural Engine do the job. Either way they make the job easier for developers. Developers only need to use the Accelerate framework and can ignore how Apple specifically speeds up matrix calculations.
This is one of the big advantages Apple has by being vertically integrated. By controlling both the hardware and the software, they can pull these kinds of tricks. So the next question is how big a deal is this? What does this buy Apple in terms of performance and capabilities?
What Are the Advantages of Apple’s Matrix Coprocessor?
Nod Labs is a company that does machine interaction, intelligence and perception. Fast matrix operations are naturally in their interest. They have written a highly technical blog post of doing performance tests of AMX: Comparing Apple’s M1 matmul performance — AMX2 vs NEON.
What they are doing is comparing performance of doing similar code using AMX with doing it using the Neon instructions, which are officially supported by ARM. Neon is a type of SIMD instructions.
What Nod Labs found was that by using AMX they were able to get twice the performance of Neon instructions for matrix operations. It doesn’t mean AMX is better for everything, but at least for machine learning and high performance computing (HPC) type of work, we can expect that AMX gives an edge over the competition.
Summary
The Apple Matrix Coprocessor looks like some rather impressive piece of hardware giving Apple’s ARM processor an edge in machine learning and HPC related tasks. Further investigation will give us a more complete picture and I can update this story with more details.
Yes a few weeks ago. It will accelerate their ability to undertake research but does not have any implications for AKIDA/Brainchip apart from this.Anybody looked into this yet?Biotome installs first LiCor Odyssey M imager in Australia
October, 2022 - Perth, Australia: Biotome has installed the first LiCor Odyssey M multimodal imager in Australia. The new imager fills a critical role in their ongoing project to help develop and validate a rapid diagnostic test for sepsis.The scanner is capable of scanning 18 imaging channels...www.biotome.com.au
What is a Scientific CMOS Camera?- Oxford Instruments
Everything you need to know about Scientific CMOS (sCMOS) camera technology a breakthrough technology based on CMOS Image Sensor design & fabrication techniquesandor.oxinst.com
Cheers for that. I'll still have a further dig anyway just in caseYes a few weeks ago. It will accelerate their ability to undertake research but does not have any implications for AKIDA/Brainchip apart from this.
My opinion only DYOR
FF
AKIDA BALLISTA
Don't know if this was posted previously when Anils one was?
Is from the recent TinyML forum end of Sept.
Watch from around the 9.15 min mark when at the end of his piece the moderator asks Christoph about the Neuromorphic Hardware / Chips etc roles could play with event-based vision.
Starts answer / comment with.....but then gets a bit cagey on architecture...wonder if just on their stuff or us as well
View attachment 23060
This is the vid presso:
Neuromorphic Event-based Vision
Christoph POSCH, CTO, PROPHESEE
Abstract (English)
Neuromorphic Event-based (EB) vision is an emerging paradigm of acquisition and processing of visual information that takes inspiration from the functioning of the human vision system, trying to recreate its visual information acquisition and processing operations on VLSI silicon chips. In contrast to conventional image sensors, EB sensors do not use one common sampling rate (=frame rate) for all pixels, but each pixel defines the timing of its own sampling points in response to its visual input by reacting to changes of the amount of incident light. The highly efficient way of acquiring sparse data, the high temporal resolution and the robustness to uncontrolled lighting conditions are characteristics of the event sensing process that make EB vision attractive for numerous applications in industrial, surveillance, IoT, AR/VR, automotive. This short presentation will give an introduction to EB sensing technology and highlight a few exemplary use cases.
Don't know if this was posted previously when Anils one was?
Is from the recent TinyML forum end of Sept.
Watch from around the 9.15 min mark when at the end of his piece the moderator asks Christoph about the Neuromorphic Hardware / Chips etc roles could play with event-based vision.
Starts answer / comment with.....but then gets a bit cagey on architecture...wonder if just on their stuff or us as well
View attachment 23060
This is the vid presso:
Neuromorphic Event-based Vision
Christoph POSCH, CTO, PROPHESEE
Abstract (English)
Neuromorphic Event-based (EB) vision is an emerging paradigm of acquisition and processing of visual information that takes inspiration from the functioning of the human vision system, trying to recreate its visual information acquisition and processing operations on VLSI silicon chips. In contrast to conventional image sensors, EB sensors do not use one common sampling rate (=frame rate) for all pixels, but each pixel defines the timing of its own sampling points in response to its visual input by reacting to changes of the amount of incident light. The highly efficient way of acquiring sparse data, the high temporal resolution and the robustness to uncontrolled lighting conditions are characteristics of the event sensing process that make EB vision attractive for numerous applications in industrial, surveillance, IoT, AR/VR, automotive. This short presentation will give an introduction to EB sensing technology and highlight a few exemplary use cases.
Agreed.@ 9:15 she asks him about the use of neuromorphic chips in regards to event based vision systems. He states that he think SNN are a good fit, but then after that he said something strange considering our previous conversation with Prophesee on our podcast where they basically said that Brainchip is the key to their success (paraphrased)
He said: “but I’m not sure if the optimal processing architecture has been identified yet” Very strange thing to say when Luca Verre spoke so highly of Brainchip and Akida and the recent partnership etc. am I reading too much into this?
I'm not trying to be a dick but if you post an article/video etc, it would be good to give it some context. So many posters do it and its annoyingThis is an example of what the next generation of mass production vehicles will be coming out with.
Akida can do facial recognition, voice commands and gestures all at the same time using very low power.
Akida can do this easy.
I put my hand up to that. Will make a conscious effort from now on.I'm not trying to be a dick but if you post an article/video etc, it would be good to give it some context. So many posters do it and its annoying
I agree with you on this one. It is annoying to see video's or articles posted by someone but the poster provides literally no context or opinion. I never read or watch the links when that's the case.I'm not trying to be a dick but if you post an article/video etc, it would be good to give it some context. So many posters do it and its annoying
Only because I asked if it was akida All good though, it's no big deal but it's just as you say, if they aren't going to read it before posting and add some context, it's likely a useless article that isn't worth reading.I agree with you on this one. It is annoying to see video's or articles posted by someone but the poster provides literally no context or opinion. I never read or watch the links when that's the case.
If it wasn't worth the posters time to to provide some information then I assume it's not worth my time to watch/read.
In this case though, @Getupthere posted the video then posted a separate comment providing the context, so go easy on the fella!
Akida in action with Renesas
I would say yes to your question you are reading too much into this. Simple logic based on known facts.@ 9:15 she asks him about the use of neuromorphic chips in regards to event based vision systems. He states that he think SNN are a good fit, but then after that he said something strange considering our previous conversation with Prophesee on our podcast where they basically said that Brainchip is the key to their success (paraphrased)
He said: “but I’m not sure if the optimal processing architecture has been identified yet” Very strange thing to say when Luca Verre spoke so highly of Brainchip and Akida and the recent partnership etc. am I reading too much into this?