BRN Discussion Ongoing

That's a serve Pom not a forehand smash volley :) I do know SUMMFING
Well you try and find a gif to suit šŸ˜‚
 
  • Haha
  • Like
Reactions: 2 users
Not listened too

 
  • Like
  • Love
  • Thinking
Reactions: 4 users

Frangipani

Top 20

Gregor Lenz, until recently CTO of our partner Neurobus (https://thestockexchange.com.au/threads/brn-discussion-ongoing.1/post-456183) and co-author of Low-power Ship Detection in Satellite Images Using Neuromorphic Hardware alongside Douglas McLelland (https://arxiv.org/pdf/2406.11319) has joined the London-based startup Paddington Robotics (https://paddington-robotics.com/ - the website doesn’t yet have any information other than ā€œPaddington Robotics - Embodied AI in Actionā€):

View attachment 81384



View attachment 81386



Some further info I was able to find about the London-based startup founded late last year, whose co-founder and CEO is Zehan Wang:

View attachment 81419


https://www.northdata.de/Paddington%20Robotics%20LtdĀ·,%20London/Companies%20House%2016015385

View attachment 81420 View attachment 81421 View attachment 81422 View attachment 81423

The other day, I came across two recent blog posts on the topic of event cameras, written by Gregor Lenz, who stepped down as CTO of BrainChip’s partner Neurobus six months ago to join Paddington Robotics, an ā€œEmbodied AIā€ startup based in London.

Definitely a sobering read for those who have been expecting a much faster adoption of event cameras across various industries. However, I personally prefer an honest ā€œwhat’s holding back the technology-assessmentā€ by someone with first-hand experience in the field of neuromorphic sensing and computing over one of those numerous (and partly AI-generated) event camera market size trajectory predictions, posted by anonymous forum users who are highly likely just BRN retail shareholders without any real insight into the actual technology and the difficulties that have so far hindered broader mainstream adoption.

While ā€œEvent cameras in 2025, Part 2ā€ is very technical in nature, the preceding blog post Part 1 makes an easier read.
[The bold print in the middle part is not intended, by the way - I keep trying to remove it, but I for some reason it won’t go away]





Event cameras in 2025, Part 1

August 13, 2025 Ā· 20 min Ā· 4123 words

Earlier this year, stepped down as CTO of Neurobus and transitioned to a role in the field of robotics in London. Despite that shift, I still believe in the potential of event cameras, especially for edge computing. Their asynchronous data capture model is promising, but the technology isn’t quite there yet. In two parts, I want to outline the main markets that I think could drive the adoption of event cameras and also talk about what’s currently holding the technology back.

Industry landscape

Fifteen years ago, the market for event cameras barely existed. Today, it’s worth around 220 million dollars. That rate of growth is actually in line with how most sensor technologies develop. LiDAR, for example, was originally created in the 1970s for military and aerospace applications. It took decades before it found its way into mainstream products, with broader adoption only starting in the 2010s when autonomous vehicles began to emerge. Time-of-flight sensors were originally explored in the 1980s, but only became widespread after Apple introduced Face ID in 2015.

Event cameras appear to be following the same trajectory. They’ve been tested across various industries, but none have yet revealed a compelling, large-scale use case that pushes them into the mainstream. When we started Neurobus in 2023, there were already several companies building neuromorphic hardware such as event-based sensors and processors. What was missing was a focus on the software and real-world applications. Camera makers were happy to ship dev kits, but few people were actually close to end users. Understanding when and why an event camera outperforms a traditional RGB sensor requires deep domain knowledge. That’s the gap I tried to bridge.

It quickly became clear that finding the right applications is incredibly difficult. Adoption doesn’t just depend on technical merit. It needs mature software, supporting hardware, and a well-defined use case. In that sense, event cameras are still early in their journey. Over the past year, I’ve explored several sectors to understand where they might gain a foothold.


Space

The private space sector has grown quickly in the last two decades, largely thanks to SpaceX. By driving down launch costs to a fraction of what they used to be, the company has made it far easier to get satellites into orbit. Here is a lovely graph showing launch costs over the past decades. Notice the logarithmic y axis! Those launch costs are going to continue to drop as rockets with bigger payloads, such as Starship, are developed.

037B25D8-C22D-449B-83E8-343D1BE1BB35.jpeg



Space Situational Awareness (SSA) from the ground

With more satellites in orbit, the risk of collisions increases, and that’s where space situational awareness, or SSA, comes into play. At the moment, the US maintains a catalogue of orbital objects and shares it freely with the world, but it’s unlikely that this will remain free forever. Other countries are starting to build their own tracking capabilities, particularly for objects in Low Earth Orbit (LEO), which spans altitudes up to 2,000 kilometers. SSA is mostly handled from the ground, using powerful RADAR systems. These systems act like virtual fences that detect any object passing through, even those as small as eight centimeters at an altitude of 1,500 kilometers. RADARs are expensive to build and operate, but their range and reliability are unmatched. For SSA solutions on the ground, optical systems play a smaller role of real-time tracking of specific objects. People built ground-based event camera SSA systems, but it is not clear what advantages they bring over conventional, high resolution, integrating sensors. There’s nothing that I know of up there that is spinning so fast that you need microsecond resolution to capture it.


Space Domain Awareness (SDA) in orbit

As orbit becomes more crowded and militarized, the need to monitor areas in real time is growing, especially regions not visible from existing ground stations (as in, anywhere other than your country and allies). Doing this from space itself offers a significant advantage, but using RADAR in orbit isn’t practical due to the power constraints of small satellites. Instead, optical sensors that can quickly pivot to observe specific areas are a better fit. To achieve good coverage, you’d need a large number of these satellites, which means that payloads must be compact and low-cost. This is where event cameras could come in. Their power efficiency makes them ideal for persistent monitoring, especially in a sparse visual environment like space. Since they only capture changes in brightness, pointing them into mostly dark space allows them to exploit sparsity very well. The data they generate is already compressed, reducing the bandwidth needed to transmit observations back to Earth. For low-power surveillance satellites in LEO, that’s a significant advantage.


Earth Observation (EO)

In Earth observation, optical sensors sit on satellites that need to orbit as low as possible in order to increase angular resolution / swath. They revolve around the Earth roughly every 90 minutes, capturing the texture-rich surface. Using an event camera for that would just generate enormous amounts of data that is of the wrong kind anyway because in EO you are interested in multi spectral bands and high spatial resolution. However there is a case that might make it worthwhile: when you compensate for the lateral motion of the satellite and fixate the event camera on a specific spot for continuous monitoring. Such systems exist today already (check out this dataset) to monitor airports, city traffic and probably some missile launch sites. Using an event camera for that would reduce processing to a bare minimum, and provide high temporal resolution objects that move. The low data rate would also allow for super low bandwidth live video streaming! Unless we’re talking constellations of 10k+ satellites however, it remains a niche use case for the military to monitor enemy terrain.

event-cameras-in-space

Potential applications of event cameras in orbit. 1. Space Domain Awareness (SDA) is about receiving requests from the ground to monitor an object in LEO in real time. 2. Video live streaming Earth observation (EO). Compensating for the satellite’s lateral motion, event cameras could monitor sites for ~10 minutes per orbit, depending on the altitude. 3. Star tracking. Lots of redundant background means that we can focus on the signal of interest using little processing.

In-orbit servicing

In-orbit servicing includes approaching and grabbing debris to de-orbit it or docking with another satellite or spacecraft for refueling or repair. These operations are delicate and often span several hours, typically controlled from the ground. With the number of satellites in orbit continuously increasing as well as many as 10 space stations planned to be built within the next decade, reliable and autonomous docking solutions will become essential. Current state-of-the-art systems already use stereo RGB and LiDAR, but event cameras might offer benefits in challenging lighting conditions or when very fast, reactive maneuvers are needed in emergency situations. I think that in-orbit servicing has many other challenges before event cameras fix the most pressing problem in that area.

Star tracking

Star trackers are a standard subsystem on most satellites. Companies like Sodern sell compact, relatively low-cost units that point upward into the sky to determine a satellite’s orientation relative to known star constellations. Today’s trackers typically deliver attitude estimates at around 10–30 Hz, consuming 3–7 W depending on the model. For most missions, that’s good enough, but there’s room for improvement. Since stars appear as sparse, bright points against an almost entirely dark background, they align perfectly with the event camera’s strengths. Instead of continuously integrating the whole scene, an event-based tracker could focus only on the few pixels where stars actually appear, cutting down on unnecessary processing. In principle, this allows attitude updates at kilohertz rates while using less compute and bandwidth. Faster updates could improve control loops during high-dynamic maneuvers or enable more precise pointing for small satellites that lack bulky stabilization hardware. From a software perspective, the task remains relatively simple: once the events are clustered into star positions, the rest of the pipeline is the conventional map-matching problem of aligning observations to a star catalogue. No complex machine learning is needed. The main challenge, as with any space payload, lies in making the hardware resilient to radiation and thermal extremes.

Revenue generation

In space applications, cost isn’t the limiting factor, which would make it a great place to start testing equipment that is not mass produced yet. Space-grade systems already command premium prices, so a $1,000+ event camera is not out of place. Compared to SDA and EO video streaming, which can generate recurring revenue as a service by providing recent and real-time data, in-orbit servicing systems or star trackers are more likely to be sold as one-off hardware solutions, which makes the business case less scalable. In either case, there’s a growing need for advanced vision systems that can operate efficiently on the edge in space. Right now, the space market for event cameras is still at an early stage, but the interest is real.

Manufacturing / Science

In industrial vision, specifically in manufacturing and scientific environments, objects move fast and precision matters. These settings are full of high-speed conveyor belts, precise milling machines, and equipment that needs real-time monitoring. On paper, this seems like a great match for event cameras, which excel in capturing rapid motion with minimal latency. But the reality is more complicated. In most factories, computer vision systems are already deeply integrated into broader automation pipelines. Changing a single sensor, even for one with better temporal precision, often isn’t worth the disruption. If a factory wants slightly better accuracy, they can usually just upgrade to a 100+ Hz version of their existing system. Need to count bottles flying past on a line? A cheap line-scanning camera will do the trick. Want to monitor vibrations on a machine? A one-dollar vibration sensor is simpler and more reliable.


Some also consider battery-powered monitoring devices for warehouses and other low-maintenance settings, where low-power vision sensors could make sense. But even there, the appeal is limited in my experience. Someone still has to replace or recharge the battery eventually, and most computer vision systems already do a good enough job without needing an event-based solution. That said, there are niche applications where event cameras could shine. High-speed scientific imaging is one example, such as tracking combustion events in engines, analysing lightning, or sorting high-throughput cell flows in cytometry. Today, these tasks often rely on bulky high-speed cameras like a Phantom, which require artificial lighting, heavy cooling systems, and massive data pipelines, often just to record a few seconds of footage.

Event cameras could offer a much more compact and energy-efficient alternative. They don’t need high-speed links or huge data buffers, and they can be powered through a USB port and still achieve sub ms latency. I think that event cameras could become a competitor for the use of high speed cameras in scientific settings, with a focus on small form factor and mobile applications. One challenge and active research area here is reconstructing high-quality video frames from events. The good news is that we’re seeing steady progress, and there’s even a public leaderboard tracking the latest benchmarks here. However, these methods currently consume an entire 100W GPU so they’re done offline. One of the biggest hurdles is collecting good ground truth data for what reconstructed frames should look like, which is why most researchers still rely on simulation. But if such reconstruction models get very good, I could imagine a business model where a user uploads the raw events, chooses the desired fps, and gets back videos with up to 10k fps. Pricing is per frame reconstructed, be it 10 Hz for 2 hours or 1 kHz for 1 second.

high-speed-rendering

A comparison between a high-speed camera and an event camera. The high-speed camera workflow shows three steps: recording, postprocessing with video compression, and the result as high frame rate RGB video. This path requires handling a large amount of data. The event camera workflow shows recording followed by frame reconstruction, leading to high frame rate greyscale video, while generating much less data overall. The diagram highlights how event cameras provide efficient high-speed imaging compared to traditional high-speed cameras.

Automotive

Event cameras in cars have had several testers, but none have stuck to it for now. While the technical case is strong, the path to adoption is complex, shaped by legacy systems, cost constraints, and the structure of the automotive supply chain. Modern cars already rely on a robust stack of sensors. RGB cameras, LiDAR, RADAR, and ultrasonic sensors all work together to enable functions like adaptive cruise control, lane keeping, emergency braking, and parking assistance. These systems are designed to be redundant and resilient across various weather and lighting conditions. For a new sensor like an event camera to be added, it must address a specific problem that the current setup cannot. In the chart below, I marked the strengths and weaknesses of sensors currently employed in cars. I rated each one on a scale from 1 to 5. Event cameras (orange) have a considerable overlap with RGB cameras (green), apart from the performance in glare or high contrast scenarios, which is covered well by RADAR (red).​

BF9EC75B-631C-4593-B150-6F8ED30FFCD4.jpeg

Nevertheless, the unique selling point of combined high temporal resolution and high dynamic range could be a differentiator. For example, detecting fast-moving objects to avoid collisions could become safety-critical. But in practice, these are edge cases, and current systems already perform well enough in most of them. The reality of integration is that automotive development happens within a well-defined supply chain. Original equipment manufacturers (OEM) like Toyota or Mercedes-Benz rarely integrate sensors themselves. Instead, they depend on Tier 1 suppliers like Bosch or Valeo to deliver complete perception modules. Those Tier 1s work with Tier 2 suppliers who provide the actual components, including sensors and chips.​

For event cameras to make it into a production vehicle, they need to be part of a fully validated module offered by a Tier 1 supplier. This includes software, calibration, diagnostics, and integration support. For startups that focus on event camera use cases, this creates a huge barrier, as the margins are already thin. You need a path to integration that matches the way the industry actually builds cars. Companies like NVIDIA are even starting to reshape this landscape. Their Drive Hyperion platform bundles sensors and compute into a single integrated solution, reducing the role of traditional Tier 1 suppliers. Hyperion already supports a carefully selected list of cameras, LiDARs, and RADARs, along with tools for simulation, data generation, and sensor calibration. Event cameras aren’t on that list yet. That means research teams inside OEMs have no easy way to test or simulate their output, let alone train models on it.​

The way automotive AI systems are being designed has also changed. Instead of having separate modules for tasks like lane detection or pedestrian recognition, modern approaches rely on end-to-end learning. Raw sensor data is fed into a large neural network that directly outputs steering and acceleration commands. This architecture scales better but makes it harder to add a new modality. Adding a sensor like an event camera doesn’t just mean collecting or simulating new data. It also means rewriting the training pipeline and handling synchronization with other sensors. Most OEMs are still trying to get good reliability from their existing stack. They’re not in a rush to adopt something fundamentally new, especially if it comes without mature tooling.​


Cost is another serious constraint. Automotive suppliers operate on tight margins, and every component is scrutinized. For instance, regulators in Europe and elsewhere are mandating automatic emergency braking. On paper, this sounds like a perfect opportunity for event cameras, especially to detect pedestrians at night. But in reality, carmakers are more likely to spend 3 extra dollars to improve their headlights than to introduce a new sensor that complicates the system. In fact, the industry trend is toward reducing the number of sensors. Fewer sensors mean simpler calibration, fewer failure modes, and lower integration overhead. From that perspective, adding an event camera can feel like a step in the wrong direction unless one is able to replace another modality altogether.

One area where event cameras might gain traction sooner is in the cabin. Driver and passenger monitoring systems are becoming mandatory in many regions. These systems typically use a combination of RGB and infrared cameras to detect gaze direction, drowsiness, and presence. An event camera could potentially replace both sensors, offering better performance in high-contrast lighting conditions, such as when bright headlights shine into the cabin at night. Cabin monitoring systems are often independent from the main driving compute platform, they have faster iteration cycles, and the integration hurdles are lower. Once an event camera is proven in this domain, it could gradually be expanded to support gesture control, seat occupancy, or mood estimation.

Visual light communication (VLC) could become a relevant application in autonomous vehicles. The idea is simple: LEDs that are already in our environment—traffic lights, street lamps, brake lights, even roadside signs—can modulate their intensity at kilohertz rates to broadcast short messages, while a receiver on the vehicle decodes them optically. Event cameras are a particularly good fit for this because they combine microsecond temporal resolution with useful spatial resolution, letting a single sensor both localize the source and decode high-frequency flicker without the rolling-shutter or motion-blur issues that plague standard frame sensors. Recent work from Woven by Toyota is a good snapshot of where this is headed: they released an event-based VLC dataset with synchronized frames, events, and motion capture and demonstrated LED beacons flickering at 5 kHz encoded via inter-blink intervals. While VLC is not going to be the main driver to integrate event cameras into cars, it’s one ā€˜part of the package’ application.

Automotive adoption moves slowly. Getting into a car platform can take five to ten years, and the technical hurdles are only part of the story. To succeed, companies developing event cameras need staying power and ideally, strategic partnerships with Tier 1 suppliers or compute platform providers. For a small startup, this is a tough road to walk alone. For the moment, in-cabin sensing might be the most realistic starting point.



[continued in next

Defence

Many of the technologies we now take for granted started with defense: GPS, the internet, radar, night vision, even early AI. Defense has always been an early adopter of bleeding-edge tech, not because it’s trendy, but because the stakes demand it. Systems need to function in low visibility, track fast-moving targets, and operate independently in environments where there’s no GPS, no 5G, and no time to wait for remote instructions. In such cases, autonomy is a requirement and modern military operations are increasingly autonomous. Drone swarms, for example, don’t rely on one pilot per unit anymore. A central command issues a mission, and the swarm executes it even deep behind enemy lines. That shift toward onboard intelligence makes the case for sensors that are low-latency, low-power, and can extract meaningful information with minimal compute. That’s where event cameras can play a role. Their high temporal resolution and efficiency make them well suited to motion detection and fast reaction loops in the field.


drone-detection

Drone detection based on time surfaces at the European Defence Tech Hackathon

We put this into practice at the European Defence Tech Hackathon in Paris last December. The Ukrainian military had outlined their biggest challenges, and drones topped the list by a mile. Over 1.2 million were deployed in Ukraine last year alone according to its Ministry of Defence, most of them manually piloted First Person View (FPV) drones. They include variants that carry a spool of lightweight optical fibre, often 10 km long, that allows the pilot to control the drone by wire, without radio signals, see the photo below. And Ukraine’s target for 2025 is a staggering 4.5 million. Main supply routes are now completely covered in anto drone nets, and fields close to the frontline are covered with optical fibre. Both sides are racing to automate anti-drone systems. At that hackathon in December, we developed an event-based drone detection system and won first place. That experience made it clear that the demand is real! An enemy drone shutdown can mean a soldier’s life saved. There’s also a pragmatic reason why the defense sector is attractive: volume. Every drone, loitering munition, or autonomous ground vehicle is a potential autonomous system. Event cameras aren’t the only option, but they’re a good candidate when fast response times are crucial and power budgets are tight.

fpv-optical-fibre-drone

An FPV drone with an optical fibre spool attached. Photo by Maxym Marusenko/NurPhoto

The European Union has committed €800 billion to defense and technological sovereignty. Whether that funding reaches startups effectively is another question, but the political intent is clear. Europe wants to control more of its military tech stack, and that opens the door to new players with homegrown solutions. Already today we see many new defence startups on the scene, a lot of them focusing on AI and autonomy. Defence comes with a lot of red tape, whether it’s access to real data, the reliance on slow government funding, the fact that it can resemble a walled garden, or simply the limited options in terms of exits. But out of all the sectors I’ve looked into, defense stands out as the most likely place for event cameras to find product-market fit first. There’s real demand, shorter adoption cycles, and a willingness to experiment. There are new companies Optera in Australia and [TempoSense][https://tempo-sense.com/] in the US (recent slides with more info) that are experimenting with making event sensors for the defence sector, and Prophesee in Europe now openly presents their work on drone navigation, detection and anti drone tech. Also Leonardo, the Italian defence company, released a paper experimenting with event cameras for drone detection.

[To be continued in next post due to the limitation of characters per post]
 
Last edited:
  • Like
  • Fire
  • Love
Reactions: 6 users

Frangipani

Top 20
[Continuation of blog post ā€œEvent cameras in 2025, Part 1ā€ by Gregor Lenz]

Wearables

Back in 2021, I explored the use of event cameras for eye tracking. I had conversations with several experts in the field, and their feedback was clear: for most mobile gaze tracking applications, even a simple 20 Hz camera was good enough. In research setups that aim to study microsaccades or other rapid eye movements, the high temporal resolution of event cameras could be useful. But even then, a regular 120 Hz camera might still get the job done.

What I didn’t fully appreciate back then was the importance of power consumption in wearable devices. My thinking was centered around AR and VR headsets, which already include high refresh rate displays that consume significant power. In that context, saving a few milliwatts didn’t seem that important. But smart glasses are a different story. They need to run for hours or days, and every bit of energy efficiency matters to prolong battery life and allow for slimmer designs. Nowadays spectacles [sic]

Prophesee recently announced a partnership with Tobii, who are a major supplier of eye tracking solutions. Zinn Labs, one of the early adopters of event-based gaze tracking, were acquired in February 2025. These developments suggest that there is traction for the technology, especially in applications where power efficiency and responsiveness are key. According to Tobi Delbruck from ETH Zurich, if spectacles catch on like smartphones, then this will be a true mass production of event vision sensors. That said, the broader question remains whether the smart glasses market will scale any time soon. Event cameras may be a good fit from a technical perspective, but the commercial success of wearables will depend on many other factors beyond just sensor performance.

zinn-labs
Prototype by Zinn Labs that includes a GenX320 sensor.


A Note on Robotics

Even though fast sensors should be great for fine-grained, low-latency loop closure in control, this field is dealing with very different challenges at the moment, at least for building Autonomous Mobile Robots or Humanoids. Controlling an arm or a leg using Visual Language Action (VLA) models is incredibly difficult, and neither input frame rate, nor dynamic range are the limitations. Even once more performant models become available, you’ll have to deal with the same challenges as in the Automotive sector, which is that adding a new modality needs lots of new (simulated) data.

Conclusion​

Event cameras have come a long way, but they are still searching for the right entry points into the mainstream. The most promising early markets seem to be in defense, where speed and efficiency are critical for drones and autonomous systems, and in wearables, where power constraints make their efficiency truly valuable. Other sectors like space, automotive, and manufacturing show interesting opportunities, but adoption is likely to remain slower and more niche for now. The trajectory of this technology suggests that with persistence and the right applications, event cameras will carve out their role in the broader sensor landscape.​

In Part 2, I will discuss the technological hurdles that event cameras are facing today.







Event cameras in 2025, Part 2

August 20, 2025 Ā· 14 min Ā· 2781 words

In Part 1 I provided a high level overview of different industry sectors that could potentially see the adoption of event cameras. Apart from the challenge of finding the right application, there are several technological challenges before event cameras can reach a mass audience.

Sensor Capabilities

Today’s most recent event cameras are summarised in the table below.

Camera SupplierSensorModel NameYearResolutionDynamic Range (dB)Max Bandwidth (Mev/s)
iniVationGen2 DVSDAVIS3462017346Ɨ260~12012
iniVationGen3 DVSDVXPlorer2020640Ɨ48090-110165
PropheseeSony IMX636EVK420201280Ɨ7201201066
PropheseeGenX320EVK32023320Ɨ320140
SamsungGen4 DVSDVS-Gen420201280Ɨ9601200

Insightness was sold to Sony, and CelePixel partnered with Omnivision, but hasn’t released anything in the past 5 years. Over the past decade, we have seen resolution grow from 128x128 to HD, but that’s actually not always good. The last column in the table above describes the number of million events per second, which can easily be reached when the camera is moving fast, such on a drone. A paper by Gehrig and Scaramuzza suggests that in low light and high speed scenarios, performance of high res cameras is actually worse than when using fewer, but bigger pixels, due to high per-pixel event rates that are noisy and cause ghosting artifacts.

In areas such as defence, higher resolution and contrast sensitivity, as well as capturing the short/mid range infrared spectrum, is going to be desirable, because range is so important. SCD USA made the MIRA 02Y-E available last year that includes an optional event-based readout, to enable tactical forces to detect laser sources. Using the event-based output, it advertises a frame rate of up to 1.2 kHz. In space, the distances to the captured objects are enormous, and therefore high resolution and light sensitivity are of utmost importance.

In short range applications such as eye tracking for wearables, a GenX320 at lower resolution but high dynamic range and ultra low power modes is going to be more interesting. For scientific applications, NovoViz recently announced a new SPAD (single photon avalanche diode) camera using event-based outputs!

One thing is clear: today’s binary microsecond spikes are rarely the right format. Much like Intel’s Loihi 2 shifted from binary spikes to richer spike payloads because they realised that the communication overhead was too high otherwise, future event cameras could emit multi-bit ā€œmicro-framesā€ or tokenizable spike packets. These would represent short-term local activity and could be directly ingested by ML models, reducing the need for preprocessing altogether. Ideally there’s a trade-off between information density and temporal resolution that can be chosen depending on the application.

A key trend are hybrid vision sensors that combine rgb and event frames. At ISSCC 2023, three papers showed new generations of hybrid vision sensors, which output both RGB frames at fixed rates and events in between.

SensorEvent output typeTiming & synchronizationPolarity infoTypical max rate
Sony 2.97 μmBinary event frames (two separate ON/OFF maps)Synchronous, ~580 µs ā€œevent frameā€ period2 bits per pixel (positive & negative)~1.4 GEvents/s
OmniVision 3-waferPer-event address-event packets (x, y, t, polarity)Asynchronous, microsecond-level timestampsSingle-bit polarity per eventUp to 4.6 GEvents/s
Sony 1.22 μm, 35.6 MPBinary event frames with row-skipping & compressionVariable frame sync, up to 10 kfps per RGB frame2 bits per pixel (positive & negative)Up to 4.56 GEvents/s

The Sony 2.97 μm chip uses aggressive circuit sharing so that four pixels share one comparator and analog front-end. Events are not streamed individually but are batched into binary event frames every ~580 µs, with separate maps for ON and OFF polarity. This design keeps per-event energy extremely low (~57 pJ) and allows the sensor to reach ~1.4 GEvents/s without arbitration delays. Because output is already frame-like, it fits naturally into existing machine learning pipelines that expect regular image-like input at deterministic timing. The OmniVision 3-wafer is different: a true asynchronous event stream is preserved. A dedicated 1MP event wafer with in-pixel time-to-digital converters stamps each event with microsecond accuracy. Skip-logic and four parallel readout channels give a 4.6 GEvents/s throughput. This is closer to the classic DVS concept, ideal for ultra-fast motion analysis or scientific experiments where every microsecond matters. The integrated image signal processor can fuse the dense 15MP RGB video with the sparse event stream in hardware for applications such as 10 kfps slow-motion videos. The Sony 1.22 μm hybrid sensor aimed at mobile devices combines a huge 35.6 MP RGB array with a 2 MP event array. Four 1.22 µm photodiodes form each event pixel (4.88 µm pitch). The event side operates in variable-rate event-frame mode, outputting up to 10 kfps inside each RGB frame period. On-chip event-drop filters and compression dynamically reduce data volume while preserving critical motion information for downstream neural networks (e.g. deblurring or video frame interpolation). It is a practical demonstration that event frames and RGB can be tightly synchronized so that a phone SoC can consume both without exotic drivers.

hybrid-vision-sensor-sony

Kodama et al. presented a sensor that outputs variable-rate binary event frames next to RGB.

hybrid-vision-sensor-sony
Guo et al. presented a new generation of hybrid vision sensor that outputs binary events.

I find the trend towards event frames interested an in line with what most researchers have been feeding their machine learning models anyway. In either case, the event camera sensor has not reached its final form yet. The question is always in what way events should be represented in order to be compatible with modern machine learning methods.


Event Representations​

Most common approaches aggregate events into image-like representations such as 2d histograms, voxel grids, or time surfaces. These are then used to fine-tune deep learning models that were pre-trained on RGB images. This leverages the breadth of existing tooling built for images and is compatible with GPU-accelerated training and inference. Moreover, it allows for adaptive frame rates, aggregating only when there’s activity and potentially saving on compute. However, this method discards much of the fine temporal structure that makes event cameras valuable in the first place. We still lack a representation for event streams that works well with modern ML architectures and preserves their sparsity. Event streams are a new data modality, just like images, audio, or text, but one for which we haven’t yet cracked the ā€œtokenization problem.ā€ A single ON or OFF event contains very little semantic information. Unlike a word in a sentence, which can encode a concept, even a dozen events reveal almost nothing about the scene. This makes direct tokenization of events inefficient and ineffective. What we need is a representation that can summarize local spatiotemporal structure into meaningful, higher-level primitives. Something akin to a ā€œvisual wordā€ for events.

It’s also inherently inefficient: the tensors produced are full of zeros, and latency grows with the size of the memory window. This becomes problematic for real-time applications where a long temporal context is needed but high responsiveness is crucial.

I think that graphs, especially dynamic, sparse graphs, are an interesting abstraction to be explored. Each node could represent a small region of correlated activity in space and time, with edges encoding temporal or spatial relationships. Recent work such as HugNet v2, DAGr, or EvGNN hardware apply Graph Neural Networks (GNNs) to event data. But several challenges remain: to generate such a graph, we need a lot of memory for all those events, and the upredictable number of incoming events makes computation extremely inefficient. This is where specialized hardware accelerators will need to come in, because dynamically fetching events is expensive. By combining event cameras with efficient ā€œgraph processors,ā€ we could offload the task of building sparse graphs directly on-chip, producing representations that are ready for downstream learning. Temporally sparse, graph-based outputs could serve as a robust bridge between raw events and modern ML architectures.

If you want to preserve sparsity, you need tokens that mean something. Individual ON/OFF events are too atomic to be useful tokens, so a practical middle ground is a two‑stage model: a lightweight, streaming ā€œtokenizerā€ that clusters local spatiotemporal activity into short‑lived micro‑features, followed by a stateful temporal model that reasons over those features. The tokenizer can be as simple as centroiding event bursts in a small spatial neighborhood with a short time constant, or as involved as a dynamic graph builder that fuses polarity, age, and motion cues. Either way, the goal is to transform a flood of spikes into a bounded, variable‑rate set of tokens with stable meaning. Next let’s explore the type of models that work well with event camera data.


Machine Learning Models​


At their core, event cameras are change detectors, which means that we need memory in our machine learning models to remember where things were before they stopped moving. We can bake memory into the model architecture by using recurrence or attention. For example, Recurrent Vision Transformers and their variants maintain internal state across time and can handle temporally sparse inputs more naturally. These methods preserve temporal continuity, but there’s a catch: most of these methods still rely on dense, voxelized inputs. Even with more efficient state-space models replacing LSTMs and BPTT (Backpropagation Through Time), we’re still processing a lot of zeros. Training is faster, but inference is still bottlenecked by inefficient representations.

Nowadays larger AI models are being pruned, distilled, and quantised to provide efficient edge models that can generalise well. Even TinyML models are students of a larger model. We have to say goodbye to the idea of training tiny models from scratch for commercial event camera applications, because they won’t perform well enough in the real world.

Spiking neural networks (SNNs) are sometimes touted as a natural fit for event data. But in their traditional form, with binary activations and reset mechanisms, leaky integrate-and-fire (LIF) neurons are handcrafted biological abstractions. If we learned anything from machine learning, it’s that handcrafted designs are inherently flawed. And neurons are an incredibly complex thing to model, as efforts such as CZI’s Virtual Cells and DeepMind’s cell simulations show. So let’s not get hung up on the artificial neuron model itself, and instead use what works well, because the field is moving incredibly fast.

I’m very optimistic about state space models (SSMs) for event vision. Instead of baking memory into heavy recurrence or dense attention, an SSM treats the scene’s latent dynamics as a continuous-time system and then discretizes only for inference. This means a single trained model can adapt to many operating modes: you can run it at different inference rates or even update state event-by-event with variable time steps—without retraining—simply by changing the integration step. That flexibility is a good match for sensors whose activity is unpredictable.


Processors​

Meyer et al. implemented a S4D SSM on Intel’s Loihi 2, constraining the state space to be diagonal so that each neuron evolves independently. They mapped these one-dimensional state updates directly to Loihi’s programmable neurons and carefully placed layers to reduce inter-core communication, which resulted in much lower latency and energy use than a Jetson GPU in true online processing. I think it’s a compelling demonstration that SSMs can be run efficiently on stateful AI accelerator hardware and I’m curious what else is coming out of that.

Some people argue that because event cameras output extremely sparse data, we can save energy by skipping zeros in the input or in intermediate activations. But I don’t buy that argument because while the input might be much sparser than an RGB frame, the bulk of the computation actually happens in intermediate layers and works with higher level representations, which are hopefully similar for both RGB and event inputs. That means that in AI accelerators we can’t exploit spatial event camera sparsity, and inference cost between RGB and event frames are essentially the same. Of course we might get different input frame rates / temporal sparsity, but those can be exploited on GPUs as well.

Keep in mind that on mixed-signal hardware, rules are different. There’s a breadth of new materials being explored, memristors and spintronics. The basic rule for analog is: if you need to convert from analog to digital too often, for error correction or because you’re storing states or other intermediate values, your efficiency gains go out of the window. Mythic AI had to painfully learn that and almost tanked, and also Rain AI pivoted from its original analog hardware and faces an uncertain future. The brain uses a mixture of analog (graded potentials, dendritic integration) and digital (spikes) signals and we can replicate this principle in silicon. But since the circuitry is the memory at the same time, it needs an incredible amount of space, and is organised in 3d. That’s really costly to do in silicon, and the major challenge is getting the heat out, which is much easier in 2d.

I think that the asynchronous compute principle is key for event cameras, but we need to realise that naĆÆve asynchrony is not constructive. Think about a roundabout, and how it manages the flow of traffic without any traffic lights. When the traffic volume is low, every car is more or less in constant motion, and latency to cross the roundabout is minimal. As the volume of traffic grows, a roundabout becomes inefficient, because the movement of any car depends on the decisions of cars nearby. For high traffic flow, it becomes more efficient to use traffic lights to batch process the traffic for multiple lanes at once, which achieves the highest throughput of cars. The same principle applies for events. When you have few pixels activated, you achieve the lowest latency when you process them as they come in, as in a roundabout. But as the amount of events / s gets larger, for example because you’re moving the camera on a car or a drone, you need to get out the traffic lights, and start and stop larger batches of events. Ideally the size of the batch depends on the event rate.

For more info about neuromorphic chips, I refer you to Open Neuromorphic’s Hardware Guide.

Conclusion​

Here are my main points:​

  • Event cameras won’t go mainstream until they move away from binary events and to richer output formats, whether from the sensor directly or an attached preprocessor.
  • Event cameras follow the trajectory of other sensors that were developed and improved within the context of defence applications.
  • We need an efficient representation that is compatible with modern ML architectures. It might well be event frames in the end.
  • Keep it practical. Biologically-inspired approaches should not distract from deployment-grade ML solutions.

The recipe that scales is: build a token stream that carries meaning, train it with cross‑modal supervision and self‑supervision that reflects real sensor noise, keep a compact scene memory that is cheap to update, and make computation conditional on activity rather than on a fixed clock.​

Binary events don’t contain enough information on their own, so they must be aggregated in one form or another. Event sensors might move from binary outputs toward richer encodings at the pixel level, attach a dedicated processor to output richer representations, or they simply output what the world already knows well: another form of frames. While many researchers (including me) originally set out to work with binary events directly, I think it is time to swallow a bitter pill and accept that computer vision will depend on frames for the foreseeable future.​

My bet is currently on the latter, because the simplest solutions tend to win.​

Deep learning started out with 32 bit floating point, dense representations, and neuromorphic started out on the other end of the spectrum at binary, extremely sparse representations. They are converging, with neuromorphic realising that binary events are expensive to transmit, and deep learning embracing 4 bit activations and 2:4 sparsity.​

Interesting research directions for event cameras today are about dynamic graph representations for efficient tokenization, state space models for efficient inference, lossy compression for smaller file sizes. To unlock the full potential of event cameras, we need to solve the representation problem to make it compatible with modern deep learning hardware and software, while preserving the extreme sparsity of the data. Also we shouldn’t be too focused on biologically-inspired processing if we want this thing to scale anytime soon. I think that either the sensors must evolve to emit richer, token-friendly outputs, or they must be paired with dedicated pre-processors that produce high-level, potentially graph-based abstractions. Once that happens, event cameras become easy enough to work with to reach the mainstream.​

Ultimately, the application dictates the design. Gesture recognition does not need microsecond temporal resolution. Eye tracking doesn’t need HD spatial resolution. And sometimes a motion sensor that will wake a standard camera will be the easiest solution.​

 
  • Fire
  • Thinking
  • Like
Reactions: 6 users
Top Bottom