View attachment 47474 Here’s why Quadric has made our Top 13 AI/ML Startups to Bet On Q4 Update:
CEO Veerbhan K. exemplifies focus and Quadric’s mission is to create ML-optimized processor IP that empowers rapid AI SoC development and model porting. This unwavering focus on efficiency is how Quadric is going to make AI more accessible, and responsible and democratize it.
It's also Kheterpal’s boldness to challenge norms that stands out. I needed no more evidence than his fearlessly posting a probing question to Andrew Ng, a luminary of AI, during a Q&A session at the AI Hardware Summit. Fortune favors the brave!
Let’s explore how he and Quadric are making AI responsible and democratizing it:
The landscape of Large Language Models is growing exponentially, creating great potential yet also creating the risk of negative consequences like models with bias and inaccuracies. It also brings to light the issue of skyrocketing and unsustainable energy consumption in training LLMs.
Only large companies can manage the rising costs of training and retraining models, contradicting the core principle of democratization.
However, Meta introduced the highly-anticipated Llama2 LLM in July which stands out because it’s open-source, free, and designed for both commercial AND research use. It’s also trained on significantly more parameters than other models, emphasizing safety and responsibility.
Meta launched Llama2 along with a groundbreaking announcement of partnering with Qualcomm, which will integrate Llama2 into next-gen Snapdragon chips in smartphones and laptops beginning next year. This is considered a milestone since LLMs have been considered viable only in data centers with access to vast power resources.
Yet Quadric doesn’t view this through rose-colored lenses. CMO Steve Roddy voiced a contrarian perspective, asking, “Why would titans of the semiconductor and IP worlds need to wait until 2024 or 2025 or beyond to support today’s newest, hottest ML model?”
With the rate of change in LLMs and vision models intensifying, the reality is most accelerators designed for AI at the Edge would require a respin for each evolution. Even FPGAs like GPUs require more power than is suitable for Edge applications.
Quadric’s approach is different. Their general-purpose Neural Processing Unit, known as "Chimera," combines field programmability, like a GPU, with a power-performance profile that makes Edge AI feasible across a variety of consumer devices. What’s more, they support this blend of programmability and performance with a dedicated Developer Studio to significantly expedite the porting process.
Quadric’s emphasis on efficiency, driven by Kheterpal’s leadership, not only empowers developers but also paves the way with fewer hurdles and faster time-to-market at reduced costs; leaving us with no doubt that Quadric is playing a pivotal role in making AI genuinely accessible to all.
Quadric is addicted to MACs, but does not like them spiky:
US2023083282A1 SYSTEMS AND METHODS FOR ACCELERATING MEMORY TRANSFERS AND COMPUTATION EFFICIENCY USING A COMPUTATION-INFORMED PARTITIONING OF AN ON-CHIP DATA BUFFER AND IMPLEMENTING COMPUTATION-AWARE DATA TRANSFER OPERATIONS TO THE ON-CHIP DATA BUFFER
Systems and methods for implementing accelerated memory transfers in an integrated circuit includes configuring a region of memory of an on-chip data buffer based on a neural network computation graph, wherein configuring the region of memory includes: partitioning the region of memory of the on-chip data buffer to include a first distinct sub-region of memory and a second distinct sub-region of memory; initializing a plurality of distinct memory transfer operations from the off-chip main memory to the on-chip data buffer; executing a first set of memory transfer operations that includes writing a first set of computational components to the first distinct sub-region of memory, and while executing, using the integrated circuit, a leading computation based on the first set of computational components, executing a second set of memory transfer operations to the second distinct sub-region of memory for an impending computation.
[0045] … Accordingly, a technical benefit achieved by an arrangement of the large register file 112 within each array core 110 is that the large register file 112 reduces a need by an array core 110 to fetch and load data into its register file 112 for processing. As a result, a number of clock cycles required by the array core 112 to push data into and pull data out of memory is significantly reduced or eliminated altogether. That is, the large register file 112 increases the efficiencies of computations performed by an array core 110 because most, if not all, of the data that the array core 110 is scheduled to process is located immediately next to the processing circuitry (e.g., one or more MACs, ALU, etc.) of the array core 110 . For instance, when implementing image processing by the integrated circuit 100 or related system using a neural network algorithm(s) or application(s) (e.g., convolutional neural network algorithms or the like), the large register file 112 of an array core may function to enable a storage of all the image data required for processing an entire image. Accordingly, a majority, most or if not, all layer data of a neural network implementation (or similar compute-intensive application) may be stored locally in the large register file 112 of an array core 110 with the exception of weights or coefficients of the neural network algorithm(s), in some embodiments. Accordingly, this allows for optimal utilization of the computing and/or processing elements (e.g., the one or more MACs and ALU) of an array core 110 by enabling an array core 110 to constantly churn data of the register file 112 and further, limiting the fetching and loading of data from an off-array core data source (e.g., main memory, periphery memory, etc.).