I see Olivier has recently provided some comments to a post by a freelancer named Artiom as per link.
Olivier talks about some tech details I found interesting (my bold) and also, when you go to Olivier's LinkedIn, it says working for BRN? I thought he moved on or I mistake something?
𝐇𝐨𝐰 𝐭𝐨 𝐮𝐬𝐞 𝐚𝐧 𝐞𝐯𝐞𝐧𝐭-𝐛𝐚𝐬𝐞𝐝 𝐜𝐚𝐦𝐞𝐫𝐚 𝐢𝐧 𝐚𝐧 𝐢𝐧𝐝𝐮𝐬𝐭𝐫𝐢𝐚𝐥 𝐜𝐨𝐧𝐭𝐞𝐱𝐭? ⚙️📸 Many industrial setups fall under a common configuration: 𝐚 𝐬𝐭𝐚𝐭𝐢𝐜 𝐜𝐚𝐦𝐞𝐫𝐚 𝐨𝐛𝐬𝐞𝐫𝐯𝐢𝐧𝐠 𝐦𝐨𝐯𝐢𝐧𝐠 𝐩𝐚𝐫𝐭𝐬 𝐨𝐧 𝐚 𝐜𝐨𝐧𝐯𝐞𝐲𝐨𝐫. In such cases, 𝐞𝐯𝐞𝐧𝐭-𝐛𝐚𝐬𝐞𝐝 𝐜𝐚𝐦𝐞𝐫𝐚𝐬 are very appealing — since the background is static, 𝐨𝐛𝐣𝐞𝐜𝐭 𝐞𝐱𝐭𝐫𝐚𝐜𝐭𝐢𝐨𝐧 𝐜𝐨𝐦𝐞𝐬 “𝐟𝐨𝐫...
www.linkedin.com
Olivier Coenen
From theoretical models to chips: I design SOTA spatiotemporal NNs (PLEIADES, TENN (Akida 2+), SCORPIUS) that train like CNNs and run efficiently at the edge like RNNs.
6d
The Austrian Institute of Technology (AIT) where Christoph Posch developed the ATIS sensor developed a linear event-based visual array of 3(?) pixel wide to scan objects on a conveyor belt moving really fast. We used the technique presented here to categorize moving grocery items and read their barcodes when put in a shopping basket. The difficulty was the effective temporal resolution, only about 0.1 ms at best, far from the ns resolution of the timestamps. So we needed better techniques, dynamical NNs to extract all the info and compensate for the time-dependent response of a pixel: if fired recently, less likely to fire again for same contrast.
We didn’t have that NN then and I think I have it now with PLEIADES and successors.
Like
Reply
Olivier Coenen
From theoretical models to chips: I design SOTA spatiotemporal NNs (PLEIADES, TENN (Akida 2+), SCORPIUS) that train like CNNs and run efficiently at the edge like RNNs.
6d
The increase resolution brings out the another main advantage of event-based vision sensors, one can eventually see/resolve objects that one couldn’t with same frame-based resolution.
We could still track drones flying in the sky that were 1/10 the size of a single pixel for example. Try that with a frame-based camera. We could generate Gigapixel resolution images with a DAVIS240 (240 x 180) of our surroundings while driving a vehicle offroad on a bumpy ride.
Like
Reply
Vincenzo Polizzi
PhD Candidate at University of Toronto | SLAM, Machine Learning, Robotics
3d
This is a great application, very nicely explained! One interesting aspect visible in your results is that edges parallel to the motion direction are much less represented, simply because they don’t generate many events. A way to address this is to excite pixels from multiple directions, ensuring that all edges trigger events. This is exactly what we explored in a class project that eventually evolved into our paper VibES: we mounted the event camera on a small mechanical platform that oscillates in both X and Y, essentially a tiny “washing-machine-like” motion

. By injecting a known motion pattern (and estimating the sinusoidal components online), we obtained dense event streams and significantly sharper reconstructions, even for edges that would otherwise remain silent. Glad to hear your thoughts!
Like
Reply
1 Reaction
Olivier Coenen
From theoretical models to chips: I design SOTA spatiotemporal NNs (PLEIADES, TENN (Akida 2+), SCORPIUS) that train like CNNs and run efficiently at the edge like RNNs.
6d
This brings out another point: single cone/rod in our retinas don’t have the angular resolution to clearly see stars in the night sky, our eyes just don’t have the necessary resolution, yet we can clearly stars. How is that then possible? Eye movement and spike timing. Within a certain spatial resolution, spikes fire at the same spatial location when our eyes move. The temporal resolution of the spike translates to the spatial resolution that we can achieve when the spikes accumulates around the same spatial location where the star is. Be drunk and the stars should appear thicker or will appear to move because the temporal spike alignment is off. Thus, what we see is clearly a result of the neural processing, not just the inputs, the neural networks processing the spike firing. That’s the neural processing we need to take full advantage of event-based (vision) sensors, without having to resort to the processing of periodic sampling of frames that the engineering world is stuck in today.