BRN Discussion Ongoing

7für7

Top 20
To be honest if this news was not posted or shared I would not have known what happened actually. I had zero impact of this issue to be honest.
Rise your hands if you use MAC ✋
 
  • Like
  • Haha
Reactions: 6 users
Rise your hands if you use MAC ✋
Nope but they taste good after a session on the piss

1721427249034.gif
 
  • Haha
  • Love
Reactions: 7 users

7für7

Top 20
Last edited:
  • Haha
  • Like
Reactions: 7 users

Guzzi62

Regular
Founding member of what, the cock head club? Why are you here?
He is the only person on here I have on ignore.

A nasty personality that should have been banned a long time ago.
 
  • Like
  • Fire
Reactions: 6 users

hotty4040

Regular
Nope but they taste good after a session on the piss

View attachment 66807
You have to be "on the piss" for these to work IMO at all, it's the "after effects" that are so distasteful . I haven't eaten ( usen ) a maccas for more than 10 years, mind there coffee ain't bad. I mean look at that stacker, it looks discuss ting, don't you reckon ? Rather have a sandwich, of any description IMHO.

Akida Ballista chippers ;) without the big m, for me, every time. errrr, yuk.
 
Last edited:
  • Like
Reactions: 1 users
You have to be "on the piss" for these to work IMO at all, it's the "after effects" that are so distasteful . I haven't eaten ( usen ) a maccas for more than 10 years, mind there coffee ain't bad. I mean look at that stacker, it looks discuss ting, don't you reckon ? Rather have a sandwich, of any description IMHO.

Akida Ballista chippers ;) without the big m, for me, every time. errrr, yuk.
I can’t even remember eating a Macdonald sober 😂
 
  • Haha
Reactions: 4 users

JDelekto

Regular
The irony is that, in its original conception (ARPANET?), the internet was supposed to be able to maintain communications despite major infrastructure damage. Now it has become the greatest single point of vulnerability - too many eggs in one basket?
What I find mildly amusing is that I don't think I've ever seen any malicious cyber attacks cause so much global chaos with such raw efficiency.

The price tag will be large on this one.
 
  • Like
Reactions: 8 users

Diogenese

Top 20
What I find mildly amusing is that I don't think I've ever seen any malicious cyber attacks cause so much global chaos with such raw efficiency.

The price tag will be large on this one.
Their insurers will be quaking in their loots.
 
  • Like
  • Haha
  • Wow
Reactions: 8 users

Tothemoon24

Top 20




The investment bank's upgrade highlights Arm's pivotal role in the burgeoning field of edge AI, underpinned by its power-efficient compute architecture and extensive developer community.

According to Morgan Stanley, "Arm's power efficient compute architecture, huge developer community and breadth of offering gives it centrality within edge AI."

Analysts explain that this centrality is crucial as edge AI, which involves processing machine learning on local devices rather than in the cloud, becomes increasingly important.

Furthermore, Morgan Stanley believes Arm's ability to scale compute efficiently across a wide range of end markets, supported by numerous licensee partners, positions it well to capitalize on this trend.

Morgan Stanley notes significant growth potential in Arm's key markets. Smartphone royalties are expected to grow to $1.87 billion by FY27, driven partly by a robust iPhone upgrade cycle.

Additionally, the automotive sector is said to present a substantial opportunity, with automotive royalties set to more than double by FY27. While AI PCs are also considered additive to Arm's prospects, the immediate impact of Windows-on-Arm is deemed less significant.
 
  • Like
  • Love
  • Fire
Reactions: 24 users

skutza

Regular
The worlds reliance on the cloud is showing its teeth. Brainchip should be all over this, the edge is the safest place to be :)
 
  • Like
  • Fire
  • Haha
Reactions: 33 users

IloveLamp

Top 20
Last edited:
  • Like
  • Thinking
  • Fire
Reactions: 26 users
Great find ILL.

I noticed also in another article on their website, mentioning some very interesting capabilities they've got going on:


"Game-Changing Advantages
Key advantages that MoleHD offers to you:
  • Backpropagation-Free Training: MoleHD does not rely on backpropagation to train its parameters. Instead, it uses one-shot or few-shot learning to establish abstract patterns represented by specific symbols.
  • Efficient Computing: Unlike neural networks that require complex operations like convolutions, MoleHD performs simple arithmetic operations such as vector addition. This efficiency allows MoleHD to run easily on common CPUs to complete both training and testing within minutes, compared to GNNs requiring much longer GPU time.
  • Smaller Language Model (SLM) Size: MoleHD needs to store only a small set of relevant vectors for comparison during inference, unlike state-of-the-art neural networks that require large language models (LLMs)for storing numerous parameters, many of which are actually not needed. On the chance that more parameters are needed, MoleHD’s SLM can be swiftly scaled up for larger tasks without incurring latency losses and increased inference times commonly seen in LLMs."
 
  • Like
  • Fire
  • Love
Reactions: 23 users

Bravo

If ARM was an arm, BRN would be its biceps💪!

Airbus and Thales eye satellite merger​

JULY 18, 2024



Screen-Shot-2024-07-18-at-5.17.31-PM.png
SKYNET 6A Airbus
Airbus and Thales are reportedly exploring a merger of their satellite operations, according to the Financial Times and Reuters. This potential deal is seen as a pivotal moment for Europe’s space industry, as it grapples with intensified competition from global rivals such as SpaceX.

Leonardo is also involved in the talks thanks to the Space Alliance‘s joint ventures with Thales. Founded in 2005, this strategic partnership includes two joint ventures: satellite producer Thales Alenia Space (Thales 67 percent, Leonardo 33 percent) and satellite services provider Telespazio (Leonardo 67 percent, Thales 33 percent) .
Screen-Shot-2024-07-18-at-5.30.28-PM-3.png

Screen-Shot-2024-07-18-at-5.31.07-PM-3.png

The combined entity would be similar to MBDA, a successful cross-border European missile consortium, and aim to enhance Europe’s strategic independence in space. However, significant hurdles, including regulatory approval and overcoming political tensions, lie ahead.
This potential partnership comes as both companies grapple with financial challenges in their space divisions. Declining demand for large satellites — such as Thales‘ Spacebus series and Airbus‘ Eurostar bus — and the rise of smaller, cheaper alternatives, like Space X‘s Starlink, have impacted their bottom lines. A merger could help them address these issues and better compete in the evolving space market.
Although GEO (Geosynchronous Orbit) satellites will continue to have their place, the smaller LEO (Low Earth Orbit) satellites have grown heavily in popularity thanks to SpaceX and other LEO operators.
by Richard Pettibone and Carter Palmer
 
Last edited:
  • Like
  • Wow
  • Fire
Reactions: 21 users

Bravo

If ARM was an arm, BRN would be its biceps💪!

Arm is poised for upside as edge AI emerges, Morgan Stanley says​

Jul. 20, 2024 8:00 AM ETArm Holdings plc (ARM) Stock, SFTBY StockBy: Chris Ciaccia, SA News Editor4 Comments
Arm, Inc headquarters in Silicon Valley

Sundry Photography
Shares of Arm Holdings (NASDAQ:ARM) have jumped more than 130% year-to-date as the artificial intelligence spending boom rages on. But with AI just starting to come to the edge, the British chip design firm is poised to reap major rewards, Morgan Stanley said.
The Wall Street firm said Arm, which is partially owned by Softbank (OTCPK:SFTBY) has multiple ways to benefit as AI moves to the edge, or consumer devices: custom silicon, new designs and extensions.
As such, it sees its total serviceable addressable market topping $14B by 2027, including $4.44B for smartphones (with expected royalties of $1.87B), automotive of $1.45B (with expected royalties of $0.33B) and the AI PC market of $7.92B (with expected royalties of $0.38B). "We would note that we are more cautious on the AI PC opportunity given historical precedent of competition in the PC space," analysts at the firm wrote in an investor note. "This is reflected in our expectations for less royalty capture relative to the size of the potential AI PC [serviceable addressable market]."
Morgan Stanley upgraded its rating on Arm to Overweight from Equal-Weight. It also raised its price target to $190 from $107, based on expectations it could earn $3.88 per share in 2027, up from $1.57 per share this year.

AI opportunities​

Arm is widely known in the technology space, given its list of customers is a "who's who" of the sector: it counts Apple (AAPL), Nvidia (NVDA), AMD (AMD), Qualcomm (QCOM) and a host of others as customers.
Arm generates the vast majority of its revenue from licensing its intellectual property to the aforementioned companies and a host of others. That's why investors were concerned in May when it offered up guidance for the remainder of the year that was below expectations. However, Morgan Stanley believes the company "low-balled" the guidance for licensing, which should set it up to report better-than-expected results.
Additionally, a number of media reports have said that Apple (AAPL) is likely to increase its iPhone shipments this year, which should also positively impact Arm, Morgan Stanley said.
Lastly, the deployments of Arm's Compute Subsystems (coming in 2025 and beyond) are being overlooked, Morgan Stanley said.
"All told we think these developing edge AI opportunities have explained some of the share price movement year to date, but our assessment of the true [serviceable addressable market] of each, along with the likelihood of Arm capturing more functionality on the CPU, we think there is more upside in the Arm story," the analysts wrote.
Analysts are largely bullish on Arm (ARM). It has a HOLD rating from Seeking Alpha authors, while Wall Street analysts rate it a BUY. Conversely, Seeking Alpha's quant system, which consistently beats the market, has no rating on ARM.
 
  • Like
  • Love
  • Fire
Reactions: 27 users

Bravo

If ARM was an arm, BRN would be its biceps💪!
I believe this use case was presented by edge impulse with us not long ago?

One of the co-founders (2024) is ex Microsoft



View attachment 66829
View attachment 66830
Hi @IloveLamp,

I think Dabar Media might may have got the CEO and Founder wrong. Jay Chaudry is actually CEO / Founder of Zcaler.

Zac Zacharia is the Lead Researcher / Cofounder of Zscale Labs, but I can't seem find anything at all about him on Google.
 
  • Like
  • Thinking
Reactions: 3 users

Bravo

If ARM was an arm, BRN would be its biceps💪!

frantic-studying.gif

OpenAI is in talks with Broadcom about developing AI chip​

Sam Altman is determined to lessen OpenAI's dependence on Nvidia​

By Skye Jacobs Today 10:46 AM
OpenAI is in talks with Broadcom about developing AI chip

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.
IN CONTEXT: OpenAI exclusively depends on Nvidia's GPUs to train and run its powerful AI models. However, it's looking to change that. Sam Altman is considering partnering with a silicon designer to manufacture an OpenAI chip. He is reportedly negotiating with Broadcom and other chip designers, but Altman's ambitions go far beyond producing a proprietary AI chip.
The information reports that Sam Altman's vision for OpenAI to develop its own AI chips to reduce its dependence on Nvidia's GPUs has led him to meet with various semiconductor designers. The talks are part of broader efforts by Altman to beef up not only its supply of components but also the infrastructure necessary – including power infrastructure and data centers – to run these powerful AI models.
OpenAI is hiring former Google employees who worked on Google's tensor processing unit as part of this initiative. Earlier this year, there were reports that Altman was seeking to raise billions of dollars to set up a network of semiconductor factories.
2024-07-20-image-2.jpg

A partnership with Broadcom makes sense for OpenAI. The company has significant experience designing custom AI accelerators, notably its collaboration with Google on the tensor processing unit. Broadcom's success with Google's widely deployed TPUs, which are now in their sixth generation, demonstrates its capability to deliver high-performance AI accelerators at scale.
Broadcom also has expertise in creating custom ASIC solutions, which aligns well with OpenAI's need for an AI accelerator tailored to its specific requirements. As a fabless chip designer, Broadcom offers a wide range of silicon solutions crucial for data center operations, including networking components, PCIe controllers, SSD controllers, and custom ASICs. So, OpenAI could leverage Broadcom's complete vertical stack of products to meet its data center needs. Broadcom's offerings in inter-system and system-to-system communication technologies could provide OpenAI with a more comprehensive solution for their AI infrastructure needs.

OpenAI is unlikely to compete with Nvidia's technological prowess immediately, as it couldn't optimistically produce a new chip until 2026. However, the company has been exploring ways to become more self-reliant in its quest for general artificial intelligence. Earlier this year, for example, it opened an office in Japan to tap into new revenue streams and collaborate with local businesses, governments, and research institutions. It also partners with entities like Khan Academy and Carnegie Mellon 🧐 to develop personalized learning experiences using AI.
 
  • Like
  • Fire
  • Haha
Reactions: 29 users

Esq.111

Fascinatingly Intuitive.
Afternoon Chippers,

Having a little poke around SoftBank website .

Nothing directly indicating BrainChip naturaly, but interesting none the less.

The below shows how one's wealth can be magnified greatly with the passage of time , combination of share buy backs & share splits.

Bring on the revinue.


Interesting site to have a look around.

😗

Regards,
Esq.
 
  • Like
Reactions: 9 users

Bravo

If ARM was an arm, BRN would be its biceps💪!

UC Santa Cruz workshop explores brain-inspired computing​

[IMG alt="X-ray images of the brain computed tomography
"]https://www.santacruzsentinel.com/wp-content/uploads/2024/05/SJM-L-BRAINS-0501-01.jpg?w=551[/IMG]
Nata-Lia
X-ray images of the brain computed tomography
Author

By ARIC SLEEPER | asleeper@santacruzsentinel.com
UPDATED: July 18, 2024 at 4:22 p.m.

SANTA CRUZ — College students and curious community members from across the county and Bay Area are invited to join a two-day workshop on July 26 and 27 at UC Santa Cruz about semiconductors and the development of computing technology that acts more like the human brain.
The workshop is funded through the National Science Foundation and is a part of its “Future of Semiconductors” program. The two-day event aims to introduce the basic science behind the materials and devices necessary for brain-inspired computing in simple enough terms for members of the general public, and to draw students interested in the field and the broader semiconductor research community in the Bay Area.
“This workshop is just as much about workforce development as it is about the future of semiconductors,” said UCSC assistant professor of physics Aiming Yan in a statement. “Being so close to Silicon Valley, we want to help students across the region realize that this is a promising area to pursue a career in.”
The event will feature presentations, panel discussions, group activities and lectures by scientists on the cutting edge of their respective fields such as Yan, who will talk about “atomically thin two-dimensional materials for brain-inspired computing” and UCSC assistant professor of electrical and computer engineering Jason Eshraghian, who championed the artificial intelligence language learning mode, SpikeGPT, which operates more like the human brain to reduce its energy consumption.
UC Santa Cruz Assistant Professor of Electrical and Computer Engineering Jason Eshraghian. (Emily Cerf / UC Santa Cruz)


UC Santa Cruz Assistant Professor of Electrical and Computer Engineering Jason Eshraghian. (Emily Cerf / UC Santa Cruz)
Eshraghian also recently co-authored a study showing that a language learning model can be powered with about the same amount of electricity as a lightbulb with custom hardware developed by the study’s team in just a few weeks. His lecture titled “How can we make artificial intelligence as efficient as the human brain?” is scheduled for July 26.
Other lecturers include University at Buffalo professor Wei Chen who will discuss “Discovery and design new materials with computers” and Foroozan Koushan from Ferroelectric Memory Company who will talk about “Device and material selection for AI application: Considerations and Approach.”






Screenshot 2024-07-21 at 1.43.54 pm.png



Here's the podcast with Sean Hehir and Jason Eshraghian if you missed it.



Screenshot 2024-07-21 at 1.57.58 pm.png
Screenshot 2024-07-21 at 1.55.30 pm.png

Screenshot 2024-07-21 at 1.58.06 pm.png




 
Last edited:
  • Like
  • Fire
  • Love
Reactions: 37 users

Bravo

If ARM was an arm, BRN would be its biceps💪!
July 16, 2024
Emily Cerf, UC Santa Cruz
A lit lightbulb laying on its side on a desktop next to an open laptop. Sparkles shimmer around the lightbulb.

Credit: iStock/Kriangsak Koopattanakij
By eliminating the most computationally expensive element of a large language model, engineers drastically improve energy efficiency while maintaining performance.
Large language models such as ChaptGPT have proven to be able to produce remarkably intelligent results, but the energy and monetary costs associated with running these massive algorithms is sky high. It costs $700,000 per day in energy costs to run ChatGPT 3.5, according to recent estimates, and leaves behind a massive carbon footprint in the process.
In a new preprint paper, researchers from UC Santa Cruz show that it is possible to eliminate the most computationally expensive element of running large language models, called matrix multiplication, while maintaining performance. In getting rid of matrix multiplication and running their algorithm on custom hardware, the researchers found that they could power a billion-parameter-scale language model on just 13 watts, about equal to the energy of powering a lightbulb and more than 50 times more efficient than typical hardware.
Even with a slimmed-down algorithm and much less energy consumption, the new, open source model achieves the same performance as state-of-the-art models like Meta’s Llama.
“We got the same performance at way less cost — all we had to do was fundamentally change how neural networks work,” said Jason Eshraghian, an assistant professor of electrical and computer engineering at the Baskin School of Engineering and the paper’s lead author. “Then we took it a step further and built custom hardware.”

Understanding the cost​

Until now, all modern neural networks, the algorithms that power large language models, have used a technique called matrix multiplication. In large language models, words are represented as numbers that are then organized into matrices. Matrices are multiplied by each other to produce language, performing operations that weigh the importance of particular words or highlight relationships between words in a sentence or sentences in a paragraph. Larger scale language models have trillions of these numbers.
“Neural networks, in a way, are glorified matrix multiplication machines,” Eshraghian said. “The larger your matrix, the more things your neural network can learn.”
For the algorithms to be able to multiply matrices together, the matrices need to be stored somewhere, and then fetched when it comes time to compute. This is solved by storing the matrices on hundreds of physically-separated graphics processing units (GPUs), which are specialized circuits designed to quickly carry out computations on very large datasets, designed by the likes of hardware giant Nvidia. To multiply numbers from matrices on different GPUs, data must be moved around, a process which creates most of the neural network’s costs in terms of time and energy.
Eliminating matrix multiplication
The researchers came up with a strategy to avoid using matrix multiplication using two main techniques. The first is a method to force all the numbers within the matrices to be ternary, meaning they can take one of three values: negative one, zero, or positive one. This allows the computation to be reduced to summing numbers rather than multiplying.
From a computer science perspective the two algorithms can be coded the exact same way, but the way Eshraghian’s team’s method works eliminates a ton of cost on the hardware side.
“From a circuit designer standpoint, you don't need the overhead of multiplication, which carries a whole heap of cost,” Eshraghian said.
This strategy was inspired by a paper produced by Microsoft that showed it was possible to use ternary numbers in neural networks, but did not go as far as to get rid of matrix multiplication, or open-sourcing their model to the public. To do this, the researchers adjusted the strategy of how the matrices communicate with each other.
Instead of multiplying every single number in one matrix with every single number in the other matrix, as is typical, the researchers devised a strategy to produce the same mathematical results. In this approach, the matrices are overlaid and only the most important operations are performed.
“It’s quite light compared to matrix multiplication,” said Rui-Jie Zhu, the paper’s first author and a graduate student in Eshraghian’s group. “We replaced the expensive operation with cheaper operations.”
Although they reduced the number of operations, the researchers were able to maintain the performance of the neural network by introducing time-based computation in the training of the model. This enables the network to have a “memory” of the important information it processes, enhancing performance. This technique paid off — the researchers compared their model to Meta’s state-of-the-art algorithm called Llama, and were able to achieve the same performance, even at a scale of billions of model parameters.

Custom chips​

The researchers designed their neural network to operate on GPUs, as they have become ubiquitous in the AI industry, allowing the team’s software to be readily accessible and useful to anyone who might want to use it.
On standard GPUs, the researchers saw that their neural network achieved about 10 times less memory consumption and operated about 25 percent faster than other models. Reducing the amount of memory needed to run a powerful large language model could provide a path forward to enabling the algorithms to run at full capacity on devices with smaller memory like smartphones.
Nvidia, the dominant producer of GPUs worldwide, designs their hardware to be highly optimized to perform matrix multiplication, which has enabled them to dominate the industry and launched them to be one of the most profitable companies in the world. However, this hardware is not fully optimized for ternary operations.
To push the energy savings even further, the team collaborated with Assistant Professor Dustin Richmond and Lecturer Ethan Sifferman in the Baskin Engineering Computer Science and Engineering department to create custom hardware. Over three weeks, the team created a prototype of their hardware on a highly-customizable circuit called a field-programmable gate array (FPGA). This hardware enables them to take full advantage of all the energy-saving features they programmed into the neural network.
With this custom hardware, the model surpasses human-readable throughput, meaning it produces words faster than the rate a human reads, on just 13 watts of power. Using GPUs would require about 700 watts of power, meaning that the custom hardware achieved more than 50 times the efficiency of GPUs.
With further development, the researchers believe they can further optimize the technology for even more energy efficiency.
“These numbers are already really solid, but it is very easy to make them much better,” Eshraghian said. “If we’re able to do this within 13 watts, just imagine what we could do with a whole data center worth of compute power. We’ve got all these resources, but let’s use them effectively.”

 
  • Like
  • Fire
  • Love
Reactions: 28 users

Diogenese

Top 20
July 16, 2024
Emily Cerf, UC Santa Cruz
A lit lightbulb laying on its side on a desktop next to an open laptop. Sparkles shimmer around the lightbulb.

Credit: iStock/Kriangsak Koopattanakij
By eliminating the most computationally expensive element of a large language model, engineers drastically improve energy efficiency while maintaining performance.
Large language models such as ChaptGPT have proven to be able to produce remarkably intelligent results, but the energy and monetary costs associated with running these massive algorithms is sky high. It costs $700,000 per day in energy costs to run ChatGPT 3.5, according to recent estimates, and leaves behind a massive carbon footprint in the process.
In a new preprint paper, researchers from UC Santa Cruz show that it is possible to eliminate the most computationally expensive element of running large language models, called matrix multiplication, while maintaining performance. In getting rid of matrix multiplication and running their algorithm on custom hardware, the researchers found that they could power a billion-parameter-scale language model on just 13 watts, about equal to the energy of powering a lightbulb and more than 50 times more efficient than typical hardware.
Even with a slimmed-down algorithm and much less energy consumption, the new, open source model achieves the same performance as state-of-the-art models like Meta’s Llama.
“We got the same performance at way less cost — all we had to do was fundamentally change how neural networks work,” said Jason Eshraghian, an assistant professor of electrical and computer engineering at the Baskin School of Engineering and the paper’s lead author. “Then we took it a step further and built custom hardware.”

Understanding the cost​

Until now, all modern neural networks, the algorithms that power large language models, have used a technique called matrix multiplication. In large language models, words are represented as numbers that are then organized into matrices. Matrices are multiplied by each other to produce language, performing operations that weigh the importance of particular words or highlight relationships between words in a sentence or sentences in a paragraph. Larger scale language models have trillions of these numbers.
“Neural networks, in a way, are glorified matrix multiplication machines,” Eshraghian said. “The larger your matrix, the more things your neural network can learn.”
For the algorithms to be able to multiply matrices together, the matrices need to be stored somewhere, and then fetched when it comes time to compute. This is solved by storing the matrices on hundreds of physically-separated graphics processing units (GPUs), which are specialized circuits designed to quickly carry out computations on very large datasets, designed by the likes of hardware giant Nvidia. To multiply numbers from matrices on different GPUs, data must be moved around, a process which creates most of the neural network’s costs in terms of time and energy.
Eliminating matrix multiplication
The researchers came up with a strategy to avoid using matrix multiplication using two main techniques. The first is a method to force all the numbers within the matrices to be ternary, meaning they can take one of three values: negative one, zero, or positive one. This allows the computation to be reduced to summing numbers rather than multiplying.
From a computer science perspective the two algorithms can be coded the exact same way, but the way Eshraghian’s team’s method works eliminates a ton of cost on the hardware side.
“From a circuit designer standpoint, you don't need the overhead of multiplication, which carries a whole heap of cost,” Eshraghian said.
This strategy was inspired by a paper produced by Microsoft that showed it was possible to use ternary numbers in neural networks, but did not go as far as to get rid of matrix multiplication, or open-sourcing their model to the public. To do this, the researchers adjusted the strategy of how the matrices communicate with each other.
Instead of multiplying every single number in one matrix with every single number in the other matrix, as is typical, the researchers devised a strategy to produce the same mathematical results. In this approach, the matrices are overlaid and only the most important operations are performed.
“It’s quite light compared to matrix multiplication,” said Rui-Jie Zhu, the paper’s first author and a graduate student in Eshraghian’s group. “We replaced the expensive operation with cheaper operations.”
Although they reduced the number of operations, the researchers were able to maintain the performance of the neural network by introducing time-based computation in the training of the model. This enables the network to have a “memory” of the important information it processes, enhancing performance. This technique paid off — the researchers compared their model to Meta’s state-of-the-art algorithm called Llama, and were able to achieve the same performance, even at a scale of billions of model parameters.

Custom chips​

The researchers designed their neural network to operate on GPUs, as they have become ubiquitous in the AI industry, allowing the team’s software to be readily accessible and useful to anyone who might want to use it.
On standard GPUs, the researchers saw that their neural network achieved about 10 times less memory consumption and operated about 25 percent faster than other models. Reducing the amount of memory needed to run a powerful large language model could provide a path forward to enabling the algorithms to run at full capacity on devices with smaller memory like smartphones.
Nvidia, the dominant producer of GPUs worldwide, designs their hardware to be highly optimized to perform matrix multiplication, which has enabled them to dominate the industry and launched them to be one of the most profitable companies in the world. However, this hardware is not fully optimized for ternary operations.
To push the energy savings even further, the team collaborated with Assistant Professor Dustin Richmond and Lecturer Ethan Sifferman in the Baskin Engineering Computer Science and Engineering department to create custom hardware. Over three weeks, the team created a prototype of their hardware on a highly-customizable circuit called a field-programmable gate array (FPGA). This hardware enables them to take full advantage of all the energy-saving features they programmed into the neural network.
With this custom hardware, the model surpasses human-readable throughput, meaning it produces words faster than the rate a human reads, on just 13 watts of power. Using GPUs would require about 700 watts of power, meaning that the custom hardware achieved more than 50 times the efficiency of GPUs.
With further development, the researchers believe they can further optimize the technology for even more energy efficiency.
“These numbers are already really solid, but it is very easy to make them much better,” Eshraghian said. “If we’re able to do this within 13 watts, just imagine what we could do with a whole data center worth of compute power. We’ve got all these resources, but let’s use them effectively.”


Close, but
1721538034411.png
 
  • Haha
  • Like
  • Fire
Reactions: 12 users
Top Bottom