Dynamic Vision Sensor (DVS) and the Dynamic Audio Sensor (DAS)
Published on Jan 16, 2024
Powered by AI and the LinkedIn community
Kailash PrasadDesign Engineer @ Arm | PMRF | IIRF | nanoDC Lab…
Published Jan 16, 2024
Follow
Have you ever wondered how the human eye
and ear
can process complex and dynamic scenes with such high speed and accuracy? Imagine if we could design artificial sensors that mimic the biological mechanisms of vision and hearing, and produce data that is more efficient and meaningful than conventional sensors.
In this post, I will introduce you to two types of neuromorphic sensors: the Dynamic Vision Sensor (DVS) and the Dynamic Audio Sensor (DAS).
These sensors are inspired by the structure and function of the retina and the cochlea, respectively, and use a novel paradigm of event-based sensing. Unlike conventional sensors that capture frames or samples at a fixed rate, event-based sensors only output data when there is a change in the input signal, such as brightness or sound intensity. This results in a stream of asynchronous events that encode the temporal and spatial information of the scene, with high temporal resolution, low latency, and high dynamic range.
- "In simpler terms, these special sensors work like our eyes and ears. They're designed based on the way our eyes' retinas and ears' cochleae function. But what sets them apart is their unique approach called event-based sensing. Unlike regular sensors that take pictures or recordings at a set speed, these event-based sensors only provide information when there's a change. Whether it's a shift in light or a change in sound, they only capture those moments. Instead of a constant flow of data, you get quick updates that show when and where things change. This gives you highly detailed and fast information about what's happening, with minimal delay and a wide range of details. It's like having sensors that focus on the important stuff, making them efficient and responsive."
The DVS is an imaging sensor that responds to local changes in brightness, and outputs events that indicate the pixel address, the polarity (increase or decrease) of the brightness change, and the timestamp. The DVS can achieve a temporal resolution of microseconds
, a dynamic range of 120 dB
, and a low power consumption of 30 mW
. The DVS can also avoid motion blur and under/overexposure that plague conventional cameras. The DVS can be used for applications such as optical flow estimation, object tracking, gesture recognition, and robotics.
The DAS is an auditory sensor that mimics the cochlea, the auditory inner ear. The DAS takes stereo audio inputs and outputs events that represent the activity in different frequency ranges. The DAS can capture sound signals with a frequency range of 20 Hz to 20 kHz
, a dynamic range of 60 dB
, and a temporal resolution of microseconds
. The DAS can also extract auditory features such as interaural time difference, harmonicity, and speaker identification.
Both the DVS and the DAS are compatible with neuromorphic computing architectures, such as spiking neural networks, that can process the event data in a parallel and distributed manner. This enables low-power and real-time computation of complex tasks such as scene understanding, speech recognition, and sound localization.
Some examples of recent products that use the DVS and the DAS are:
- The Prophesee Metavision Camera, which is a high-resolution DVS camera that can capture fast and complex motions with minimal data and power consumption.