BRN Discussion Ongoing


Founding Member
Baby clown wilzy, how about doing us all a big favour and shut it , you're dribbling shite

Looks like you're preparing to have yourself removed again and then you (like clockwork) will create yet another account on this forum... announce your return 2 or 3 times in a row as though you had an aneurysm.

You might find more cuddles over at the crapper. You don't seriously think anyone here legitimately "welcomed" you back LMAO?
  • Like
  • Haha
  • Love
Reactions: 8 users
Just saw this, was posted a day ago.

Not had time to listen yet to see if any tidbits of links.

tinyML Asia 2022 Keynote - Gregory Cohen: Biology-inspired Space Imaging with Neuromorphic Systems

Biology-inspired Space Imaging with Neuromorphic Systems

Gregory COHEN, Associate Professor of Neuromorphic Systems deputy director of the International Centre for Neuromorphic Systems (ICNS) , Western Sydney University

  • Like
  • Fire
  • Love
Reactions: 10 users


  • Haha
  • Like
Reactions: 8 users


Founding Member
  • Like
  • Haha
Reactions: 6 users
Qualcomm was $0.42 cents a few years ago ( so what’s 20 odd years)
Now it’s over $ 100.00
So they have a few less shares outstanding
1.12 billion 🧠 chip has 1.7 billion
So in a few years we might be worth a little more than the $0.66 cents we are to day
So 😅 hang in there I think it’s going to be worth it 🥰
  • Like
  • Love
  • Fire
Reactions: 32 users
Just saw this, was posted a day ago.

Not had time to listen yet to see if any tidbits of links.

tinyML Asia 2022 Keynote - Gregory Cohen: Biology-inspired Space Imaging with Neuromorphic Systems

Biology-inspired Space Imaging with Neuromorphic Systems

Gregory COHEN, Associate Professor of Neuromorphic Systems deputy director of the International Centre for Neuromorphic Systems (ICNS) , Western Sydney University

Thanks @Fullmoonfever

That was fascinating!

I see WSU was using Prophesee 4 Gen cameras: I expect we’ll be in their next generations.

I’d love to see the version of the presentation that is for “Classified” eyes only as to what they can see/detect from space as that is an obvious use case!

  • Like
  • Fire
  • Love
Reactions: 19 users


Well done BrainChip. We just passed 8,000 followers on LinkedIn. 👏👏👏

Looking forward to an increased cadence for the next 8.

I think we need about another 1400 Linkedin followers by the end of the year (earlier on this year, some smarty pants used a linear regression to determine the number of Linkedin followers by the end of the year and boy is he way off.... :) )
  • Like
  • Haha
  • Love
Reactions: 8 users


Founding Member
This is what our friends Socionext is saying about what they are doing at CES

  • Like
  • Fire
  • Love
Reactions: 57 users


There is nothing else that will come close for two to three years.

Imagine the dots after AKIDA .02?

Pantene Peeps 😉
  • Like
  • Love
  • Fire
Reactions: 27 users


Well done BrainChip. We just passed 8,000 followers on LinkedIn. 👏👏👏

Looking forward to an increased cadence for the next 8.
Just over 6600 on twitter


  • 373DF451-402C-48FC-8D41-85FECFDA05B8.jpeg
    475.6 KB · Views: 101
  • Like
  • Fire
  • Love
Reactions: 23 users



Anyone have the time to masticate this bolus of a thesis? Perhaps just Chapter 11...

Autonomous and Predictive Systems to Enhance the Performance of Resilient Networks

Chapter 11 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs 11.1 Introduction The proliferation of the Internet of Things (IoT) and the success of rich cloud services have pushed the horizon of a new computing paradigm, edge computing, that requires faster data processing at the network’s edge. The edge computing market is expected to reach close to $9 billion by 2025 [254]. As a specific example, the significant factors driving the growth of the IoT in the manufacturing market include growing demand for industrial automation in the manufacturing industry, rising need for centralized monitoring and predictive maintenance of resources, rise in the number of cost-effective and intelligent connected devices and sensors, among others. To keep up with this demand, there has been a shift to serverless frameworks for computation [255, 256]. Serverless frameworks allow IoT applications to be de195 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 ployed within minutes with the flexibility to reduce or expand operations seamlessly. For instance, serverless functions provide authentication and encryption capabilities on-site instead of uploading the data over a vulnerable network to the cloud. This requires efficient scale up (or down) of edge computing infrastructure for transient spikes in serverless workloads. However, managing edge computes capacity on the fly, i.e., transient compute elasticity, carries specific challenges [257, 258]. First, the expanding edge deployments are time and resource-intensive. A typical solution is to over-allocate resources for possible future demand. However, over-allocation leads to under-utilization for the most part and is economically undesirable. Edge computing requires careful planning of the available resources in-situ to achieve its primary objective of faster processing and reduced latencies. Second, and most importantly, sudden spikes in demand for processing could create compute bottlenecks, leading to service level agreement (SLA) violations. SLA comprises the agreed-upon QoS (Quality of Service) attributes monitored regularly; failing to meet the QoS attributes can attract hefty penalties. In this context, we ask the following research question: How could we design an architecture that can handle sudden spikes in demand, address transient elasticity, and allocate compute resources efficiently? We propose AKIDA, a new edge computing platform that leverages heterogeneouscomputing nodes (including domain-specific accelerators like SmartNICs) to dynamically allocate computation requirements for workload spikes with minimal cold start latency. We use SoC-based SmartNICs to predict and intelligently load-balance containerized serverless workloads across the heterogeneous-compute resources. AKIDAuses untapped general-purpose compute on SmartNICs for in-network application processing when demand escalation is imminent. SmartNICs are ideal candidates for application offload because: (i) they are closer to the data ingress pipeline 196 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 that enables them to bypass the network stack overhead at the host server, (ii) of the availability and proximity of SoC-based onboard compute for application processing [259, 260], (iii) they are a feasible alternative to the traditional servers for short term compute, and (iv) unused compute cycles on the SmartNICs can be re-purposed for workloads. This is the first study to propose containerized application offload to SoC-based SmartNICs to our knowledge. Although prior works have studied the applicability of offloading specific parts of applications, e.g., using P4 programmability, actor-programming paradigm, etc. [261, 262, 263], those studies are limited to particular applications and require code modification for other types of application offload. In contrast, AKIDAis designed to offload a network of containers onto the SmartNIC, making it truly application-agnostic and scalable. Our platform has three unique elements: (i) a workload predictor, (ii) a traffic distributor, and (iii) an orchestrator. The workload predictor estimates the potential change in demand for the next time horizon by extracting fine-grained input features from historical time-series data. The traffic distributor distributes the traffic based on the transient spikes and CPU load on each cluster node. Finally, the orchestrator sets the threshold levels for intelligent traffic distribution to cluster nodes and manages the end-to-pipeline for application processing. It also can reallocate workloads on the fly to the SmartNICs, if the incoming requests for an application suddenly change. AKIDA’s orchestrator can be generalized for scaling edge across multiple servers and different kinds of SmartNICs. Stated otherwise, our system can be scaled to offload applications across different dimensions of heterogeneity (for instance, if the cluster introduces additional compute nodes). This approach enables us to secure a competitive advantage compared to legacy edge architectures and deployments. This chapter makes the following key contributions: • Design of a novel architecture that leverages heterogeneous computing nodes 197 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 (SmartNICs and host server) to facilitate efficient handling of transient spikes at the edge; • Development and characterization of workload predictor and orchestrator that work in tandem to reduce SLA violations, efficiently handle spikes in demand, and reduce cold start latency; • Characterization of competitive advantages of our architecture through an indepth analysis of capital expense costs and overhead savings from minimizing SLA violations. Our investigation reveals that capital expenditure (CAPEX) can be reduced by 1.5⇥, while the operational expenditure (OPEX) can be decreased by 3.5⇥. In addition, our architecture demonstrably reduces SLA violation by as much as 20% in real-world deployments. 11.2 Background This section provides an overview of multicore SoC-based SmartNICs, and how they are integrated into the edge computing platform. In addition, we briefly discuss the edge computing architecture and explore some common SLA violations typically prevalent in this context. 11.2.1 SmartNICs There are broadly three categories of network accelerators or SmartNICS: ASIC, FPGA, and SoC-based SmartNICs [264, 262]. In this study, we focus on SoC-based SmartNICs only. Multicore SoC-based SmartNICs use embedded CPU cores to process packets, trading some performance to provide substantially better programmability than ASIC-based designs. (e.g., DPDK-style code can be directly run on a 198 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 familiar Linux environment). For instance, Mellanox Bluefield [259] uses generalpurpose CPU cores (ARM64), while others, like Netronome [265], have specific cores for network processing. SoC-based SmartNICs (e.g., Mellanox) have two modes of operation: Embedded, and Separated modes. The interfaces are mapped to the host OS network stack in embedded mode, and the kernel routes packets from the host. The host OS and the SmartNIC have separate, independent network stacks to process packets in the separated mode. While we observe slightly better tail-latencies from packet processing in embedded mode, the offset from separate mode is negligible. For AKIDA, we adopt the separated mode due to its programmable flexibility and the ability to run containers directly on the SmartNIC’s ARM64 OS. 11.2.2 Edge Computing The adaption of cloud computing platforms is increasing rapidly. However, efficient processing of the data that has been produced at the edge of the network is a challenging task. Data-driven applications are increasingly deployed at the edge and will consequently benefit from edge computing, which we explore here. Networking bottlenecks: Compared to the fast-developing cloud-based processing speed, the network bandwidth has reached a standstill. With the growing quantity of data generated at the edge, the rate of data transportation is becoming the bottleneck for the cloud-based computing paradigm. For instance, we expect autonomous vehicles to output a vast amount of data per hour that needs real-time processing. In this instance, edge computing is beneficial over cloud computing because of the significant savings in latency overheads. Additionally, scaling these pipelines for multiple vehicles would require 199 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 computation at the edge, not the cloud. Explosion of IoT: Almost all kinds of electrical devices will become part of IoT, and they will play the role of data producers and consumers, such as air quality sensors, LED bars, streetlights, and even an Internet-connected microwave oven. Reports suggest that the number of IoT devices at the edge will develop to more than billions in a few years [266]. Thus, the raw data produced will be enormous, making conventional cloud computing not efficient enough to handle all this data – application processing at the edge could account for this surge in demand. Data producers: In the cloud computing paradigm, the end devices at the edge typically are data consumers. For example, they are consuming on-demand video streams on a smartphone. However, vast amounts of data are now produced by the said-consumers. Changing from a data consumer to a data producer requires more placement of functionalities at the edge. 11.2.3 SLA Violations Service Level Agreements are critical when applications are deployed in a Service Oriented Architecture (SOA). SLAs are commonly adopted in cloud computing and, more recently, at the Edge. SLA defines the level of service the consumer expects based on metrics that the application provider lays out. SLA composes of the metrics by which the service is measured, such as monitoring the QoS (Quality of Service) attributes [267, 268], and the remedies or penalties if the metric measurement does not meet the agreed-on service level termed as SLA Violation. Some of the most common QoS attributes that are part of SLA are response time and throughput, we primarily focus on response time. In Edge Computing, where there are limited resources when the application re200 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 ceives multiple queries at scale, the response time suffers high tail latency. This problem is further strained when the host OS has an additional background workload for other applications or maintains the edge infrastructure for its Network and Storage needs. This leads to SLA violation and the consumer’s poor application Quality of Experience (QoE). We use the response time metric in Sec. 11.4 to evaluate the penalty with and without additional processing units such as SmartNICs. 11.2.4 Need for Accelerators There has been a lot of research recently in the industry regarding using SmartNICs in cloud data center servers to boost performance by offloading computation in servers by performing network datapath processing. This section explains why SmartNICs are essential in the new generation of high-performance computing servers. The cost of building an interconnection network for a large cluster can significantly affect the choice of design decisions. With increasing network interface bandwidths, the gap between the network performance and compute performance is widening. This has resulted in increased adoption and deployment of SmartNICs. If SmartNICs were leveraged to offload only network functionalities, it would add 30% more computational capacity to the current servers [269]. Typically, SoC-based SmartNICs are priced at 25-30% the cost of Data Center Servers. Therefore, adding a SmartNIC to perform only network functions is a wise decision. However, the SmartNICs can do more than network functions. As per our initial analysis, the compute capacity of an SoC-based SmartNIC is generally around 40-50% of server compute capacity. If additional compute is required within this range, exploiting the total capacity of SmartNICs to manage workload spikes instead of servers is a 201 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 more economical decision. However, all that compute available on SmartNICs is currently primarily used for offloading network functions and services. In most cases, that is a severe under-utilization of the available compute power on SmartNICs. It is this under-utilized compute that AKIDA aims to harvest and make available to the applications. 11.3 System Overview We begin by providing an overview of AKIDA, an intelligent fabric software framework that can be deployed on any container orchestration supporting Operating Systems such as Servers/Server Racks, Network Switches, or Edge-systems. Figure 11.1 shows the various components of AKIDA framework. The server can host any number of SmartNICs as the number of PCIe buses available. We use Kubernetes as the container orchestration system that runs on the host and SmartNIC OS, and this specialized architecture works only on SoC-based SmartNIC architecture [259]. The major components of our core solution consist of (i) a traffic distributor module that distributes the traffic based on the service time and CPU load of each server and SmartNIC, (ii) a workload prediction module that uses the history of the workload in a window to predict the workload spikes and (iii) the AKIDA orchestrator module manages the workload spikes based on the load on the servers and SmartNICs. In the following, we describe our solution to each module. 11.3.1 Traffic Distributor The current serverless computing design assumes that all computing resource 202 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 Serverless control plane Collect Workload history and predict future spikes Managing the workload spikes using SmartNICs Innovative seamless workload manager Heterogeneous data and compute plane Service Gateway Our Proposed System Solution 3. Spike Management Solution 2. Workload Prediction Solution 1. Configure traffic distribution Serverless functions Arm OS SmartNICs Serverless functions X86 Host Server Serverless requests Spike Detection and Threshold estimation Figure 11.1: System overview. Figure 11.2: Traffic distributor. nodes are homogeneous and have the same service time and the same amount of load. In this chapter, we show that this assumption leads to degraded performance of workloads running on multiple nodes, especially when one of the compute nodes get overloaded or takes more time to service the requests. To clarify the problem, consider two serverless functions A and B that take 2/10 seconds to run on the SmartNIC and 1/5 second to run on the host OS, respectively, but when the load on the host OS gets overloaded with other workloads, the response times on the host OS changes to 3/8 for functions A and B respectively 1. In this example, it is better to run the 1We note that these numbers are subject to change time to time depending on the workload burst 203 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 two functions on the host OS when the host OS is not overloaded, and when it gets overloaded, function A can be offloaded to the SmartNIC. 11.3.2 Traffic Distributor In our design, the queries first arrive at the API gateway of the scheduler within the SmartNIC OS, where our traffic distributor distributes the traffic according to the service time of each SmartNICs’ ARM core or host OS’s core within a server. We note that the service time of each function is subject to change depending on the workload spikes. Assuming the requests arrive with the arrival rate of l and assuming each host OS and SmartNIC have a service rate of µi and have an M/M/1 queue at each server, the optimal traffic distributor that makes the sojourn time equal for each queue is as follows: l1 µ1 = l2 µ2 = ... = ln µn (11.1) In other words, the optimal traffic distribution on N servers is as follows: li = µi + l ÂN j=1 µj N i = 1, ..., N (11.2) In the evaluation, we use a heuristic approach and try to avoid distributing the traffic on a cluster node with very high service time due to workload spikes. The queries are then redirected to the appropriate containerized application pods running either on the Host or SmartNIC OS. and resource congestion on the SmartNICs and host OS servers. 204 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 11.3.3 Workload Prediction To provision the workload spikes proactively to meet the required Service Level Agreement (SLAs), we predict the future workload demands ahead of time. We propose a support vector regression (SVR) prediction model that predicts the workload bursts to trigger the traffic distribution module and also mitigate the impact of containers’ cold start latency [270, 271, 272, 273] that can generally lead to a longer response time to application queries otherwise. Our prediction model is based on the past observations of the workload over a window size of W time units. We change the window size dynamically based on the workload variations over time. We increase the training window size if the workload variation over the current window is less than 10% and decreases once the workload variation is more than 20%. 11.3.4 AKIDA’s Orchestrator AKIDA, consists of a resource monitoring module and exploits the output of the prediction module. The resource monitoring module periodically monitors each node’s CPU, memory utilization, and service rates in the serverless platform. If the CPU utilization gets higher than a specified threshold D, or if the service rate of application X on one of the nodes in the cluster gets higher than the specified SLA, we re-distribute the workload to dampen the spikes. We use the output of the workload prediction module to predict future spikes ahead of time and perform proactive spike management. Pro-active spike management that exploits the prediction module has two benefits: (i) first, we can re-distribute the traffic based on the predicted future workload, which avoids specific server nodes from getting congested, and (ii) second, it mitigates the containers’ cold start latency by starting new containers before the actual load arrives. The spike management module updates the service rate, µi of each 205 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 node in the cluster and requests arrival rates in the traffic distributor module, and triggers a new traffic distribution command if the spikes are higher than a specified threshold or the mean service rate of a node in the cluster increases and violate the specified SLA metric. 11.3.5 Auto-scaler After splitting the traffic between multiple queues, we scale up/down the number of replicas at each queue. Our auto-scaling algorithm is based on the arrival rate of the predicted workload at time t, (i.e., lt), the current number of replicas rt, and the current service rate of the replicas at each server/SmartNIC (µt). We can draw the system utilization as follows: rt = lt rtµt (11.3) Then we calculate the probability that the queue is idle as follows: Po = 1/[ rt1 Â n=0 (rtrt)n n! + (rtrt)r t rt!(1 rt ] (11.4) The queue length is Lq = r rt t rrt+1 t rt!(1 rhot)2 P0 (11.5) and the expected waiting time on the queue is Tq = Lq/lt. Given the current number of replicas and the system’s service time, we calculate the system’s latency Tq + Ts + 2d (where 2d accounts for the auto-scaling startup latency) if the latency was larger than the target SLA, we increment the number of replicas and calculate the optimal number of replicas using a binary search algorithm. If Tq + Ts + d was smaller than the target SLA latency, we scale down and find the optimal number of replicas using a binary search algorithm. 206 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 DL360 Gen9 Server external internal Network Switch DL360 Gen9 Server Figure 11.3: Real world experimental setup. 11.4 Competitive Advantages We set up the testbed of AKIDA using DL380 Gen9 Servers and two Mellanox Bluefield [259] SmartNICs per server as shown in Figure 11.4. We deployed a Kubernetes cluster over both server and SmartNIC OS to obtain heterogeneous multicore cluster nodes. We implemented a prototype based on OpenFaaS serverless infrastructure. We evaluated it on three popular serverless workloads, (i) CPU-intensive Fibonacci function, (ii) latency-sensitive key-value store, and (iii) a sentiment analysis function that uses machine learning to perform natural language processing. We build the functions to run on a multi-architecture platform, including x86 host OS and the SmartNICs’ ARM core. We first run initial experiments to find the compute capacity of SmartNICs by running Fibonacci functions on SmartNICs and Host. We observe the compute capacity close to that of the host’s resources. Figure 11.4(a) shows the execution time of running the Fibonacci function on the host OS and the SmartNIC as we increase the Fibonacci number to compute. We observe that SmartNICs have comparable compute capacity as x86-64 Hosts, which assures that the SmartNICs are capable of 207 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 running workloads and processing incoming packet traffic. We also ran initial experiments on an online prediction model to predict future workloads ahead of time to narrow down the best-performing algorithm that works well with our solution. We used 10,000 data points from real serverless workloads that provide an appropriate workload for a ride-sharing application to request a ride [274, 275]. Figure 11.4 shows the workload prediction using the RBF and linear kernel in the SVR prediction model when we train the model over a window size of 100 seconds and predict the future workload d seconds ahead of time. As shown, the RBF kernel performs better than the linear kernel. In the following sub-sections, we investigate data centers’ different design choices to manage the load spikes. a. Response time of b. Predicting the workload the SmartNIC and host OS. d seconds ahead of time where d = 10. Figure 11.4: Experimental results on the real world testbed. 11.4.1 Performance Benefits To evaluate the performance benefit of using SmartNICs in the cluster when having a high CPU load, we perform a set of experiments on the three serverless functions in our testbed using OpenFaas serverless platform with the hey HTTP(S) load generator [276] and emulate transient spikes using a stress tool[277]. Figure 11.5 208 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 a. Fibonacci. b. Key-value store. c. Sentiment analysis. Figure 11.5: Response time distribution of different functions. shows the response time distribution for different functions. The SLA threshold is specified by the application and exposed to the scheduler. We first run the default OpenFaas scheduler on one server, we introduce stress on the host server and increase the average CPU utilization to 80% by running background serverless workload with 200 average queries per second (Case 1: 1 server with background workload). The tail latency increases when the host OS has a high load, leading to SLA violations. Adding another server with uniform traffic distribution (default Kubernetes scheduler) in the baseline (2 servers, one with background workload and one without background workload) does not solve the problem since half of the queries are routed to the overloaded host. Next, we run the workload on 2 servers with load-aware proportional traffic distribution (Case 2: two servers with proportional traffic distribution similar to AKIDA’s traffic distributor). In AKIDA, we detect the overloaded node in the cluster and avoid routing the traffic to that node. We run AKIDA in two cases when having one SmartNIC and two SmarttNICs on the same server. Although the SmartNICs have lower computational power than the host OS when a transient spike overloads the CPU, AKIDA leverages SmartNIC’s compute capacity to reduce SLA violations. 209 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 11.4.2 Cost Benefits In this section, we perform a cost analysis of the cluster design choices based on the actual CPU utilization dataset in [278] to compare the network design of over-provisioning the servers to meet SLA during the workload spikes and by using the SmartNICs to manage the spikes. We assume a SmartNIC is about 15-20% of the cost of a server. We calculate CAPEX and OPEX for three resource deployment scenarios, i) two servers, ii) one server and one SmartNIC, and iii) one server and two SmartNICs at each edge node to accommodate the spikes. The x-axis shows the number of edge nodes at each case (i.e 1 edge node + 1 extra server, 1 edge node + 1 SmartNIC, and 1 edge node + 2 SmartNICs). Figure 11.6(a) shows the capital expenses for building a cluster in (i), (ii), and (iii). As shown, the SmartNICs provide an extra computational capacity to the cluster at a much lower cost. The total cost of the cluster reduces by a factor of 1.5 and 1.55 when using one or two SmartNICs at each x86 host, respectively. This section’s CAPEX and OPEX cost calculations are based on rough numbers available for cost and maximum energy consumption of the servers and SmartNICs in our testbed. Figure 11.6(b) shows operational expenses by tracking maximum power (one of the main contributors to OPEX) used in the cluster for Cases i, ii, and iii. The SmartNICs used in our testbed are 3.5x more energy efficient than the host server. Figure 11.6(b) shows that the maximum power usage of the cluster reduces by a factor of 1.5 and 1.27 when having one or two SmartNICs at each server, respectively. 210 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 a. Capital expenditure. b. Operational expenditure. Figure 11.6: Operational performance as the cluster size increases.
  • Like
  • Love
  • Fire
Reactions: 27 users
Might be worth keeping an eye on Neuraville. Nothing conclusive now but they are working on designing brains for robots. Their relocation earlier this year should give them a greater incentive to consider Akida if they aren't already.

A startup that works in the intersection of neuroscience and robotics industries announced that it has established Pittsburgh as its new headquarters after moving from Wichita, Kansas.

Founded in 2020, Neuraville LLC will now occupy 1,200-square-feet in the former space along Craig Street in Oakland of Pittsburgh Innovation District’s coworking platform for entrepreneurs and innovators called Avenu. It'll ideally grant the startup the ability to tap into the talent pool coming out of the University of Pittsburgh and Carnegie Mellon University
  • Like
  • Fire
  • Love
Reactions: 23 users
We got a mention from Deloittes a few years back but they haven’t included us in this article. CES 2023 should put us back on their radar!

It’s mostly what we know and talks about partnerships and eco-systems to get products to market and the systems functioning together.

Dell gets a mention as a big player and so does Cisco.

Battle for the Enterprise Edge: Providers prepare to pounce on the emerging enterprise edge computing market​

Cloud, telco, equipment, and platform companies are vying for a share of enterprise investments in edge services and products that make computing faster, safer, and cheaper.​

  • Like
  • Fire
  • Love
Reactions: 24 users

Groundbreaking AI​

We are proud to announce that we just surpassed shipping more than 2 billion AI-capable products to date. Making us one of the biggest AI companies in the world and the AI Inferencing leaders at the edge

Our fastest and most advanced Qualcomm AI Engine has at its heart, the powerful Hexagon processor.

Brand New Architecture​

The Qualcomm Hexagon processor is the most essential element of the Qualcomm AI engine. This year we added new architectural features to the heart of our AI Engine. Let’s dive into them.

Dedicated Power Delivery System​

With a dedicate power delivery system we can freely provide power to Hexagon adaptive to its workload, activate the performance all the way up for heavy workloads or down to extreme power savings

More Hardware Acceleration​

We also added a special hardware to improve group convolution, activation function acceleration and doubled the performance of the Tensor accelerator

Micro Tile Inferencing​

Our unique approach to accelerating complex AI models is by breaking down neural networks into micro tiles to speed up the inferencing process. This allows, the scalar, vector and tensor accelerators to work at the same time without having to engage the memory each time, saving power and time

Hexagon Direct Link​

We are now enabling seamless multi-IP communication effectively with Hexagon using a physical bridge. This link drives high bandwidth and low latency driven use cases like the Cognitive-ISP or upscaling of low resolution in gaming scenarios

Enabling INT4 for the Qualcomm AI Engine​

We successfully enabled transformation of several DL models from FP32 to INT16 to INT8 while not compromising on accuracy and getting the added advantage of higher performance at lower memory consumption. Now we are pushing the boundaries with INT4 for even higher power savings without compromising accuracy or performance.

Qualcomm Sensing Hub​

The latest generation of Sensing hub has more horsepower with an additional dedicated AI processor to support audio, sensing and camera-based experiences, becoming the most powerful ultra-low-power AI architecture in the industry.

Unified AI Software Portfolio​

Qualcomm® AI Stack is a comprehensive AI solution for OEMs and developers, supporting a wide range of intelligent devices with broader AI software access and compatibility. For the first time, a single AI software portfolio works on every Qualcomm Technologies platform spanning the wide range of Connected Intelligent Edge products

Qualcomm AI Enables Cars, Smart Factories, and More at the Connected Intelligent Edge

Friday, December 9, 2022 - 1:00pm
NEWSROOM: Qualcomm
CAMPAIGN: Purposeful Innovation
OnQ Blog

From farms and factories, to Snapchat, the Meta Quest 2, Resident Evil 4, the AXON Body 3 and Fleet 3, the Cadillac LYRIQ, and countless cutting-edge devices and platforms, Qualcomm AI Solutions are conquering some of the world’s toughest problems.
Thanks to our powerful Qualcomm AI Engine, we can now run highly complicated neural networks solely on-device to keep your personal data local and secure. This exciting technology also allows us to scale and deploy our long-proven mobile AI innovations to wireless earbuds, smartphones, laptops, robotics, IoT, data centers, and cars — in other words, everything at the Connected Intelligent Edge.

What is the Connected Intelligent Edge?
The Connected Intelligent Edge is how we envision a future where everything is intelligently connected. Our on-device AI is at the core of what makes edge devices intelligent — providing high-performance, efficient computing and cutting-edge wireless technologies that connect everything. Furthermore, our leadership in on-device AI processing further amplifies key technologies such as camera, sound, gaming, sensors and connectivity, as well as the user experiences they bring. These technologies will scale to support every single device at the edge. Check out the following link to learn more about how Qualcomm Technologies supports Connected Intelligent Edge devices.

Pushing autonomous driving
With our Snapdragon Digital Chassis, we bring a comprehensive set of connected automotive platforms for telematics and connectivity, computing, and driver assistance and autonomy. With these capabilities at hand, automakers can deliver connected and intelligent experiences that are safer, customizable, and immersive.

Within the Snapdragon Digital Chassis, there are currently four platforms, two of which leverage on-device AI: the Snapdragon Ride Platform and the Snapdragon Cockpit Platform. Snapdragon Ride provides on-device AI for front and surround-view cameras for advanced driver assistance systems (ADAS) and autonomous driving (AD). The Snapdragon Cockpit Platform provides an immersive, comfortable experience powered by AI – this includes in-car voice assistance, augmented reality-based navigation systems, personalized audio experiences with multi-audio zones and engine noise suppression capabilities, and in-cabin monitoring.

Providing excellent video and voice calls
As video conferencing has become part of our day-to-day lives, we are constantly improving how we interact with our devices. Our Qualcomm AI Engine delivers unparalleled user experiences in Snapdragon powered laptops, running AI-based noise cancellation and suppression for the Snapdragon Compute Platforms. We are also scaling these AI qualities and putting them in the latest earbuds powered by our Qualcomm S5 Sound Platform with advance AI-base noise cancellation.

Creating smarter factories
We envision creating a future where everything is connected. This includes factories where our robotics solutions, powered by Qualcomm Robotics RB6 Platform, can run complex on-device AI applications and deep learning workloads to transform industries. Our Qualcomm Vision Intelligence Platform provides IoT devices capabilities to perform body detection, face recognition, object classification, license place recognition and more, driving AI into the industrial space – keeping data local for increased privacy and latency.

Driving immersive experiences
AI paves the way to a new frontier of XR (eXtended Reality), bringing the best immersive experiences possible while adapting to the spaces around us. The Snapdragon XR2 Platform, through Snapdragon Spaces Technology, develops capabilities like hand tracking and image recognition for a new virtual and augmented reality user experience.

Furthermore, Snapdragon Spaces enables developers to utilize our on-device AI to seamlessly blend the lines between our physical and digital realities, transforming the world around us in ways limited only by our imaginations.

The future of cloud computing
Leveraging our long history of mobile technology leadership, we’ve created an AI solution that is designed to meet cloud AI inferencing needs for datacenter providers. The Qualcomm Cloud AI 100 is built from the ground up to help accelerate AI experiences, providing a cutting-edge solution that addresses the most important aspects of cloud AI inferencing — including low power consumption, scale, process node leadership, and signal processing expertise.

Some of the multiple use cases include e-commerce and Natural Language Processing in the cloud – with capabilities like topic or sentiment classification – to automatically organize messages, provide better recommendations, and power intelligent devices.

Qualcomm Technologies is leading the realization of the Connected Intelligent Edge, driving the convergence of wireless connectivity, efficient computing and distributed AI, and accelerating digital transformation to help deliver a new wave of innovation and growth.

Learn how Qualcomm enables a world where everyone and everything can be intelligently connected
Qualcomm AI Engine, Snapdragon, Snapdragon Digital Chassis, Snapdragon Ride, Qualcomm Robotics RB6, Qualcomm Vision Intelligence Platform, Snapdragon Spaces, Qualcomm S5 Sounds Platform, and Qualcomm Cloud AI 100 are products of Qualcomm Technologies, Inc. and/or its subsidiaries.


All the promotional material from Qualcomm means their either a serious competitor or a partner. Hopefully they will be the next company to list us a partner or as our customer!

I keep thinking back to the Valeo presentation in Feb 2022 where they are listed along with NVIDIA. I also keep in the back of my mind that we are a trusted partner and LiDAR is our strong point which leans towards us being in the Scala 3 Lidar which is very exciting. I will be much happier when we are finally disclosed.


This presentation from Valeo is very encouraging: it’s dated now (Feb 22) but if you haven’t reviewed it it indicates a strong growth pathway!

I hope both NVIDIA and Qualcomm are planning on using our IP as well however Qualcomm’s suite of products released suggests otherwise. In saying that we must be offering something NVIDIA and Qualcomm don’t or we wouldn’t be a trusted partner.… And we don’t need the whole market to be successful!

  • Like
  • Love
  • Fire
Reactions: 42 users


Spot on.

This is something I've posted about before in relation to using DVS in cars. My guess is that they set the threshold to compensate for the camera movement, but it seems to me to be pretty complex.

Here is a patent application which addresses the problem:



View attachment 24979

The invention relates to a device (16) for compensating for movement of an event sensor (12) in an event stream generated by the event sensor (12) during observation of an environment in a time interval, the compensating device (16) comprising: - a compensating unit (34), the compensating unit (34) being able to receive data relating to the movement of the event sensor (12) during the time interval and to apply a compensating technique to the event stream generated by the event sensor (12) depending on the received data so as to obtain, in the time interval, a compensated event stream; and - a control unit (40) for controlling the compensating unit (34), the control unit (40) being able to control the compensating unit (34) via transmission of data relating to the movement of the event sensor (12).
Maybe it’s not that complicated? If the camera moves causing all pixels to record a change event perhaps that result could be ignored since no other situation other than the camera itself moving would cause that kind of change event to occur? Of course I’m probably thinking to simplistically about it though.
  • Like
Reactions: 4 users


Top 20
Maybe it’s not that complicated? If the camera moves causing all pixels to record a change event perhaps that result could be ignored since no other situation other than the camera itself moving would cause that kind of change event to occur? Of course I’m probably thinking to simplistically about it though.
So if it's in a car ... ?
  • Like
  • Thinking
Reactions: 4 users


I only have two words to say HEWLETT-PACKARD:

“• Fixed broadband: Our analysis of applications on fixed broadband and data center traffic enables us to identify unique challenges facing such networks and inform the design of scalable systems using domain-specific accelerators at the network’s edge.
Contributions: The growth of edge-computing systems is driven by solutions and applications requiring high-performance and low-latency video conferenc- ing and streaming services. The adoption of serverless frameworks to process applications at the edge has increased significantly. However, provisioning for additional computing needs on a transient basis for sudden workload spikes, or transient elasticity, is non-trivial. Scaling serverless functions at the edge poses critical challenges. Service-level agreement (SLA) violations are typically frequent in such scenarios. Since SLA violations carry severe penalties, one common way to eliminate violations is to over-allocate resources preemptively. This solution leads to the under-utilization of expensive resources. Meanwhile, SmartNICs (smart network interface cards) have gained popularity to offload various network functions and provide real-time, line-rate computing at scale.
Our study proposes AKIDA, a new architecture that strategically harvests the
untapped compute capacity of the SmartNICs to offload transient workload 10

Introduction Chapter 1
spikes, thereby reducing the SLA violations. Usage of this untapped compute capacity is more favorable than adding and deploying additional servers, as SmartNICs are economically and operationally more desirable. AKIDA is a low-cost and scalable platform that orchestrates seamless offloading of server- less workloads to the SmartNICs at the network edge, eliminating the need for pre-allocating expensive compute power and over-utilization of host servers. Our system evaluation shows that SLA violations can be reduced by 20% for certain workloads.”

Definitely a BIG KEV moment brought to you by the generously shared research of @Quatrojos

My opinion only DYOR

Most of it is way over my head but I do like the sounds of this section
  • Like
  • Fire
  • Love
Reactions: 40 users


Top 20

Anyone have the time to masticate this bolus of a thesis? Perhaps just Chapter 11...

Autonomous and Predictive Systems to Enhance the Performance of Resilient Networks

Chapter 11 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs 11.1 Introduction The proliferation of the Internet of Things (IoT) and the success of rich cloud services have pushed the horizon of a new computing paradigm, edge computing, that requires faster data processing at the network’s edge. The edge computing market is expected to reach close to $9 billion by 2025 [254]. As a specific example, the significant factors driving the growth of the IoT in the manufacturing market include growing demand for industrial automation in the manufacturing industry, rising need for centralized monitoring and predictive maintenance of resources, rise in the number of cost-effective and intelligent connected devices and sensors, among others. To keep up with this demand, there has been a shift to serverless frameworks for computation [255, 256]. Serverless frameworks allow IoT applications to be de195 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 ployed within minutes with the flexibility to reduce or expand operations seamlessly. For instance, serverless functions provide authentication and encryption capabilities on-site instead of uploading the data over a vulnerable network to the cloud. This requires efficient scale up (or down) of edge computing infrastructure for transient spikes in serverless workloads. However, managing edge computes capacity on the fly, i.e., transient compute elasticity, carries specific challenges [257, 258]. First, the expanding edge deployments are time and resource-intensive. A typical solution is to over-allocate resources for possible future demand. However, over-allocation leads to under-utilization for the most part and is economically undesirable. Edge computing requires careful planning of the available resources in-situ to achieve its primary objective of faster processing and reduced latencies. Second, and most importantly, sudden spikes in demand for processing could create compute bottlenecks, leading to service level agreement (SLA) violations. SLA comprises the agreed-upon QoS (Quality of Service) attributes monitored regularly; failing to meet the QoS attributes can attract hefty penalties. In this context, we ask the following research question: How could we design an architecture that can handle sudden spikes in demand, address transient elasticity, and allocate compute resources efficiently? We propose AKIDA, a new edge computing platform that leverages heterogeneouscomputing nodes (including domain-specific accelerators like SmartNICs) to dynamically allocate computation requirements for workload spikes with minimal cold start latency. We use SoC-based SmartNICs to predict and intelligently load-balance containerized serverless workloads across the heterogeneous-compute resources. AKIDAuses untapped general-purpose compute on SmartNICs for in-network application processing when demand escalation is imminent. SmartNICs are ideal candidates for application offload because: (i) they are closer to the data ingress pipeline 196 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 that enables them to bypass the network stack overhead at the host server, (ii) of the availability and proximity of SoC-based onboard compute for application processing [259, 260], (iii) they are a feasible alternative to the traditional servers for short term compute, and (iv) unused compute cycles on the SmartNICs can be re-purposed for workloads. This is the first study to propose containerized application offload to SoC-based SmartNICs to our knowledge. Although prior works have studied the applicability of offloading specific parts of applications, e.g., using P4 programmability, actor-programming paradigm, etc. [261, 262, 263], those studies are limited to particular applications and require code modification for other types of application offload. In contrast, AKIDAis designed to offload a network of containers onto the SmartNIC, making it truly application-agnostic and scalable. Our platform has three unique elements: (i) a workload predictor, (ii) a traffic distributor, and (iii) an orchestrator. The workload predictor estimates the potential change in demand for the next time horizon by extracting fine-grained input features from historical time-series data. The traffic distributor distributes the traffic based on the transient spikes and CPU load on each cluster node. Finally, the orchestrator sets the threshold levels for intelligent traffic distribution to cluster nodes and manages the end-to-pipeline for application processing. It also can reallocate workloads on the fly to the SmartNICs, if the incoming requests for an application suddenly change. AKIDA’s orchestrator can be generalized for scaling edge across multiple servers and different kinds of SmartNICs. Stated otherwise, our system can be scaled to offload applications across different dimensions of heterogeneity (for instance, if the cluster introduces additional compute nodes). This approach enables us to secure a competitive advantage compared to legacy edge architectures and deployments. This chapter makes the following key contributions: • Design of a novel architecture that leverages heterogeneous computing nodes 197 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 (SmartNICs and host server) to facilitate efficient handling of transient spikes at the edge; • Development and characterization of workload predictor and orchestrator that work in tandem to reduce SLA violations, efficiently handle spikes in demand, and reduce cold start latency; • Characterization of competitive advantages of our architecture through an indepth analysis of capital expense costs and overhead savings from minimizing SLA violations. Our investigation reveals that capital expenditure (CAPEX) can be reduced by 1.5⇥, while the operational expenditure (OPEX) can be decreased by 3.5⇥. In addition, our architecture demonstrably reduces SLA violation by as much as 20% in real-world deployments. 11.2 Background This section provides an overview of multicore SoC-based SmartNICs, and how they are integrated into the edge computing platform. In addition, we briefly discuss the edge computing architecture and explore some common SLA violations typically prevalent in this context. 11.2.1 SmartNICs There are broadly three categories of network accelerators or SmartNICS: ASIC, FPGA, and SoC-based SmartNICs [264, 262]. In this study, we focus on SoC-based SmartNICs only. Multicore SoC-based SmartNICs use embedded CPU cores to process packets, trading some performance to provide substantially better programmability than ASIC-based designs. (e.g., DPDK-style code can be directly run on a 198 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 familiar Linux environment). For instance, Mellanox Bluefield [259] uses generalpurpose CPU cores (ARM64), while others, like Netronome [265], have specific cores for network processing. SoC-based SmartNICs (e.g., Mellanox) have two modes of operation: Embedded, and Separated modes. The interfaces are mapped to the host OS network stack in embedded mode, and the kernel routes packets from the host. The host OS and the SmartNIC have separate, independent network stacks to process packets in the separated mode. While we observe slightly better tail-latencies from packet processing in embedded mode, the offset from separate mode is negligible. For AKIDA, we adopt the separated mode due to its programmable flexibility and the ability to run containers directly on the SmartNIC’s ARM64 OS. 11.2.2 Edge Computing The adaption of cloud computing platforms is increasing rapidly. However, efficient processing of the data that has been produced at the edge of the network is a challenging task. Data-driven applications are increasingly deployed at the edge and will consequently benefit from edge computing, which we explore here. Networking bottlenecks: Compared to the fast-developing cloud-based processing speed, the network bandwidth has reached a standstill. With the growing quantity of data generated at the edge, the rate of data transportation is becoming the bottleneck for the cloud-based computing paradigm. For instance, we expect autonomous vehicles to output a vast amount of data per hour that needs real-time processing. In this instance, edge computing is beneficial over cloud computing because of the significant savings in latency overheads. Additionally, scaling these pipelines for multiple vehicles would require 199 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 computation at the edge, not the cloud. Explosion of IoT: Almost all kinds of electrical devices will become part of IoT, and they will play the role of data producers and consumers, such as air quality sensors, LED bars, streetlights, and even an Internet-connected microwave oven. Reports suggest that the number of IoT devices at the edge will develop to more than billions in a few years [266]. Thus, the raw data produced will be enormous, making conventional cloud computing not efficient enough to handle all this data – application processing at the edge could account for this surge in demand. Data producers: In the cloud computing paradigm, the end devices at the edge typically are data consumers. For example, they are consuming on-demand video streams on a smartphone. However, vast amounts of data are now produced by the said-consumers. Changing from a data consumer to a data producer requires more placement of functionalities at the edge. 11.2.3 SLA Violations Service Level Agreements are critical when applications are deployed in a Service Oriented Architecture (SOA). SLAs are commonly adopted in cloud computing and, more recently, at the Edge. SLA defines the level of service the consumer expects based on metrics that the application provider lays out. SLA composes of the metrics by which the service is measured, such as monitoring the QoS (Quality of Service) attributes [267, 268], and the remedies or penalties if the metric measurement does not meet the agreed-on service level termed as SLA Violation. Some of the most common QoS attributes that are part of SLA are response time and throughput, we primarily focus on response time. In Edge Computing, where there are limited resources when the application re200 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 ceives multiple queries at scale, the response time suffers high tail latency. This problem is further strained when the host OS has an additional background workload for other applications or maintains the edge infrastructure for its Network and Storage needs. This leads to SLA violation and the consumer’s poor application Quality of Experience (QoE). We use the response time metric in Sec. 11.4 to evaluate the penalty with and without additional processing units such as SmartNICs. 11.2.4 Need for Accelerators There has been a lot of research recently in the industry regarding using SmartNICs in cloud data center servers to boost performance by offloading computation in servers by performing network datapath processing. This section explains why SmartNICs are essential in the new generation of high-performance computing servers. The cost of building an interconnection network for a large cluster can significantly affect the choice of design decisions. With increasing network interface bandwidths, the gap between the network performance and compute performance is widening. This has resulted in increased adoption and deployment of SmartNICs. If SmartNICs were leveraged to offload only network functionalities, it would add 30% more computational capacity to the current servers [269]. Typically, SoC-based SmartNICs are priced at 25-30% the cost of Data Center Servers. Therefore, adding a SmartNIC to perform only network functions is a wise decision. However, the SmartNICs can do more than network functions. As per our initial analysis, the compute capacity of an SoC-based SmartNIC is generally around 40-50% of server compute capacity. If additional compute is required within this range, exploiting the total capacity of SmartNICs to manage workload spikes instead of servers is a 201 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 more economical decision. However, all that compute available on SmartNICs is currently primarily used for offloading network functions and services. In most cases, that is a severe under-utilization of the available compute power on SmartNICs. It is this under-utilized compute that AKIDA aims to harvest and make available to the applications. 11.3 System Overview We begin by providing an overview of AKIDA, an intelligent fabric software framework that can be deployed on any container orchestration supporting Operating Systems such as Servers/Server Racks, Network Switches, or Edge-systems. Figure 11.1 shows the various components of AKIDA framework. The server can host any number of SmartNICs as the number of PCIe buses available. We use Kubernetes as the container orchestration system that runs on the host and SmartNIC OS, and this specialized architecture works only on SoC-based SmartNIC architecture [259]. The major components of our core solution consist of (i) a traffic distributor module that distributes the traffic based on the service time and CPU load of each server and SmartNIC, (ii) a workload prediction module that uses the history of the workload in a window to predict the workload spikes and (iii) the AKIDA orchestrator module manages the workload spikes based on the load on the servers and SmartNICs. In the following, we describe our solution to each module. 11.3.1 Traffic Distributor The current serverless computing design assumes that all computing resource 202 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 Serverless control plane Collect Workload history and predict future spikes Managing the workload spikes using SmartNICs Innovative seamless workload manager Heterogeneous data and compute plane Service Gateway Our Proposed System Solution 3. Spike Management Solution 2. Workload Prediction Solution 1. Configure traffic distribution Serverless functions Arm OS SmartNICs Serverless functions X86 Host Server Serverless requests Spike Detection and Threshold estimation Figure 11.1: System overview. Figure 11.2: Traffic distributor. nodes are homogeneous and have the same service time and the same amount of load. In this chapter, we show that this assumption leads to degraded performance of workloads running on multiple nodes, especially when one of the compute nodes get overloaded or takes more time to service the requests. To clarify the problem, consider two serverless functions A and B that take 2/10 seconds to run on the SmartNIC and 1/5 second to run on the host OS, respectively, but when the load on the host OS gets overloaded with other workloads, the response times on the host OS changes to 3/8 for functions A and B respectively 1. In this example, it is better to run the 1We note that these numbers are subject to change time to time depending on the workload burst 203 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 two functions on the host OS when the host OS is not overloaded, and when it gets overloaded, function A can be offloaded to the SmartNIC. 11.3.2 Traffic Distributor In our design, the queries first arrive at the API gateway of the scheduler within the SmartNIC OS, where our traffic distributor distributes the traffic according to the service time of each SmartNICs’ ARM core or host OS’s core within a server. We note that the service time of each function is subject to change depending on the workload spikes. Assuming the requests arrive with the arrival rate of l and assuming each host OS and SmartNIC have a service rate of µi and have an M/M/1 queue at each server, the optimal traffic distributor that makes the sojourn time equal for each queue is as follows: l1 µ1 = l2 µ2 = ... = ln µn (11.1) In other words, the optimal traffic distribution on N servers is as follows: li = µi + l ÂN j=1 µj N i = 1, ..., N (11.2) In the evaluation, we use a heuristic approach and try to avoid distributing the traffic on a cluster node with very high service time due to workload spikes. The queries are then redirected to the appropriate containerized application pods running either on the Host or SmartNIC OS. and resource congestion on the SmartNICs and host OS servers. 204 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 11.3.3 Workload Prediction To provision the workload spikes proactively to meet the required Service Level Agreement (SLAs), we predict the future workload demands ahead of time. We propose a support vector regression (SVR) prediction model that predicts the workload bursts to trigger the traffic distribution module and also mitigate the impact of containers’ cold start latency [270, 271, 272, 273] that can generally lead to a longer response time to application queries otherwise. Our prediction model is based on the past observations of the workload over a window size of W time units. We change the window size dynamically based on the workload variations over time. We increase the training window size if the workload variation over the current window is less than 10% and decreases once the workload variation is more than 20%. 11.3.4 AKIDA’s Orchestrator AKIDA, consists of a resource monitoring module and exploits the output of the prediction module. The resource monitoring module periodically monitors each node’s CPU, memory utilization, and service rates in the serverless platform. If the CPU utilization gets higher than a specified threshold D, or if the service rate of application X on one of the nodes in the cluster gets higher than the specified SLA, we re-distribute the workload to dampen the spikes. We use the output of the workload prediction module to predict future spikes ahead of time and perform proactive spike management. Pro-active spike management that exploits the prediction module has two benefits: (i) first, we can re-distribute the traffic based on the predicted future workload, which avoids specific server nodes from getting congested, and (ii) second, it mitigates the containers’ cold start latency by starting new containers before the actual load arrives. The spike management module updates the service rate, µi of each 205 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 node in the cluster and requests arrival rates in the traffic distributor module, and triggers a new traffic distribution command if the spikes are higher than a specified threshold or the mean service rate of a node in the cluster increases and violate the specified SLA metric. 11.3.5 Auto-scaler After splitting the traffic between multiple queues, we scale up/down the number of replicas at each queue. Our auto-scaling algorithm is based on the arrival rate of the predicted workload at time t, (i.e., lt), the current number of replicas rt, and the current service rate of the replicas at each server/SmartNIC (µt). We can draw the system utilization as follows: rt = lt rtµt (11.3) Then we calculate the probability that the queue is idle as follows: Po = 1/[ rt1 Â n=0 (rtrt)n n! + (rtrt)r t rt!(1 rt ] (11.4) The queue length is Lq = r rt t rrt+1 t rt!(1 rhot)2 P0 (11.5) and the expected waiting time on the queue is Tq = Lq/lt. Given the current number of replicas and the system’s service time, we calculate the system’s latency Tq + Ts + 2d (where 2d accounts for the auto-scaling startup latency) if the latency was larger than the target SLA, we increment the number of replicas and calculate the optimal number of replicas using a binary search algorithm. If Tq + Ts + d was smaller than the target SLA latency, we scale down and find the optimal number of replicas using a binary search algorithm. 206 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 DL360 Gen9 Server external internal Network Switch DL360 Gen9 Server Figure 11.3: Real world experimental setup. 11.4 Competitive Advantages We set up the testbed of AKIDA using DL380 Gen9 Servers and two Mellanox Bluefield [259] SmartNICs per server as shown in Figure 11.4. We deployed a Kubernetes cluster over both server and SmartNIC OS to obtain heterogeneous multicore cluster nodes. We implemented a prototype based on OpenFaaS serverless infrastructure. We evaluated it on three popular serverless workloads, (i) CPU-intensive Fibonacci function, (ii) latency-sensitive key-value store, and (iii) a sentiment analysis function that uses machine learning to perform natural language processing. We build the functions to run on a multi-architecture platform, including x86 host OS and the SmartNICs’ ARM core. We first run initial experiments to find the compute capacity of SmartNICs by running Fibonacci functions on SmartNICs and Host. We observe the compute capacity close to that of the host’s resources. Figure 11.4(a) shows the execution time of running the Fibonacci function on the host OS and the SmartNIC as we increase the Fibonacci number to compute. We observe that SmartNICs have comparable compute capacity as x86-64 Hosts, which assures that the SmartNICs are capable of 207 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 running workloads and processing incoming packet traffic. We also ran initial experiments on an online prediction model to predict future workloads ahead of time to narrow down the best-performing algorithm that works well with our solution. We used 10,000 data points from real serverless workloads that provide an appropriate workload for a ride-sharing application to request a ride [274, 275]. Figure 11.4 shows the workload prediction using the RBF and linear kernel in the SVR prediction model when we train the model over a window size of 100 seconds and predict the future workload d seconds ahead of time. As shown, the RBF kernel performs better than the linear kernel. In the following sub-sections, we investigate data centers’ different design choices to manage the load spikes. a. Response time of b. Predicting the workload the SmartNIC and host OS. d seconds ahead of time where d = 10. Figure 11.4: Experimental results on the real world testbed. 11.4.1 Performance Benefits To evaluate the performance benefit of using SmartNICs in the cluster when having a high CPU load, we perform a set of experiments on the three serverless functions in our testbed using OpenFaas serverless platform with the hey HTTP(S) load generator [276] and emulate transient spikes using a stress tool[277]. Figure 11.5 208 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 a. Fibonacci. b. Key-value store. c. Sentiment analysis. Figure 11.5: Response time distribution of different functions. shows the response time distribution for different functions. The SLA threshold is specified by the application and exposed to the scheduler. We first run the default OpenFaas scheduler on one server, we introduce stress on the host server and increase the average CPU utilization to 80% by running background serverless workload with 200 average queries per second (Case 1: 1 server with background workload). The tail latency increases when the host OS has a high load, leading to SLA violations. Adding another server with uniform traffic distribution (default Kubernetes scheduler) in the baseline (2 servers, one with background workload and one without background workload) does not solve the problem since half of the queries are routed to the overloaded host. Next, we run the workload on 2 servers with load-aware proportional traffic distribution (Case 2: two servers with proportional traffic distribution similar to AKIDA’s traffic distributor). In AKIDA, we detect the overloaded node in the cluster and avoid routing the traffic to that node. We run AKIDA in two cases when having one SmartNIC and two SmarttNICs on the same server. Although the SmartNICs have lower computational power than the host OS when a transient spike overloads the CPU, AKIDA leverages SmartNIC’s compute capacity to reduce SLA violations. 209 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 11.4.2 Cost Benefits In this section, we perform a cost analysis of the cluster design choices based on the actual CPU utilization dataset in [278] to compare the network design of over-provisioning the servers to meet SLA during the workload spikes and by using the SmartNICs to manage the spikes. We assume a SmartNIC is about 15-20% of the cost of a server. We calculate CAPEX and OPEX for three resource deployment scenarios, i) two servers, ii) one server and one SmartNIC, and iii) one server and two SmartNICs at each edge node to accommodate the spikes. The x-axis shows the number of edge nodes at each case (i.e 1 edge node + 1 extra server, 1 edge node + 1 SmartNIC, and 1 edge node + 2 SmartNICs). Figure 11.6(a) shows the capital expenses for building a cluster in (i), (ii), and (iii). As shown, the SmartNICs provide an extra computational capacity to the cluster at a much lower cost. The total cost of the cluster reduces by a factor of 1.5 and 1.55 when using one or two SmartNICs at each x86 host, respectively. This section’s CAPEX and OPEX cost calculations are based on rough numbers available for cost and maximum energy consumption of the servers and SmartNICs in our testbed. Figure 11.6(b) shows operational expenses by tracking maximum power (one of the main contributors to OPEX) used in the cluster for Cases i, ii, and iii. The SmartNICs used in our testbed are 3.5x more energy efficient than the host server. Figure 11.6(b) shows that the maximum power usage of the cluster reduces by a factor of 1.5 and 1.27 when having one or two SmartNICs at each server, respectively. 210 AKIDA: Accelerating In-Network, Transient Compute Elasticity using SmartNICs Chapter 11 a. Capital expenditure. b. Operational expenditure. Figure 11.6: Operational performance as the cluster size increases.

11.3 System Overview
We begin by providing an overview of AKIDA, an intelligent fabric software framework that can be deployed on any container orchestration supporting Operating Systems such as Servers/Server Racks, Network Switches, or Edge-systems. Figure 11.1 shows the various components of AKIDA framework. The server can host any number of SmartNICs as the number of PCIe buses available. We use Kubernetes as the container orchestration system that runs on the host and SmartNIC OS, and this specialized architecture works only on SoC-based SmartNIC architecture.


This is the referenced patent:

US11436054B1 Directing queries to nodes of a cluster of a container orchestration platform distributed across a host system and a hardware accelerator of the host system


Example implementations relate to edge acceleration by offloading network dependent applications to a hardware accelerator. According to one embodiment, queries are received at a cluster of a container orchestration platform.
The cluster includes a host system and a hardware accelerator, each serving as individual worker machines of the cluster. The cluster further includes multiple worker nodes and a master node executing on the host system or the hardware accelerator.
A first worker node executes on the hardware accelerator and runs a first instance of an application.
A distribution of the queries is determined among the worker machines based on a queuing model that takes into consideration the respective compute capacities of the worker machines.
Responsive to receipt of the queries by the host system or the hardware accelerator, the queries are directed to the master node or one of the worker nodes in accordance with the distribution

The use of the word AKIDA should be brought to the attention of BrainChip's patent attorneys to head off a potential trade mark infringement.
  • Like
  • Fire
  • Love
Reactions: 33 users



11.3 System Overview
We begin by providing an overview of AKIDA, an intelligent fabric software framework that can be deployed on any container orchestration supporting Operating Systems such as Servers/Server Racks, Network Switches, or Edge-systems. Figure 11.1 shows the various components of AKIDA framework. The server can host any number of SmartNICs as the number of PCIe buses available. We use Kubernetes as the container orchestration system that runs on the host and SmartNIC OS, and this specialized architecture works only on SoC-based SmartNIC architecture.

View attachment 25054

This is the referenced patent:

US11436054B1 Directing queries to nodes of a cluster of a container orchestration platform distributed across a host system and a hardware accelerator of the host system

View attachment 25053

Example implementations relate to edge acceleration by offloading network dependent applications to a hardware accelerator. According to one embodiment, queries are received at a cluster of a container orchestration platform.
The cluster includes a host system and a hardware accelerator, each serving as individual worker machines of the cluster. The cluster further includes multiple worker nodes and a master node executing on the host system or the hardware accelerator.
A first worker node executes on the hardware accelerator and runs a first instance of an application.
A distribution of the queries is determined among the worker machines based on a queuing model that takes into consideration the respective compute capacities of the worker machines.
Responsive to receipt of the queries by the host system or the hardware accelerator, the queries are directed to the master node or one of the worker nodes in accordance with the distribution

The use of the word AKIDA should be brought to the attention of BrainChip's patent attorneys to head off a potential trade mark infringement.
This goes back to 2020. AKIDA was software back then, no?
  • Like
Reactions: 3 users
Top Bottom