qDVS 论文笔记

Posted on 2023-01-28 Edited on 2023-04-22 Views: Word count in article: 1.1k Reading time ≈ 4 mins.

Paper's full name: A 256x256 6.3pJ/pixel-event Query-driven Dynamic Vision Sensor with Energy-conserving Row-parallel Event Scanning, Link

Idea

This paper proposed a novel query-driven DVS (qDVS) hardware. This new hardware combines the advantages of APS and DVS, following a fixed scanning rate to inquire all pixels whether are good to fire an event. The output of qDVS is event frames. Pixels here are responsible for fewer functions, they only need to tell whether they are good to shoot and the polarity, as their address is fixed on the generated event frames. As each pixel has fewer functions, they are able to be made smaller, which results in an overall higher pixel density. Also, since the output of qDVS is already framed, machine learning researchers don't need to do accumulation themselves and hence optimized the processing pipeline.

Questions

They used a fixed scanning rate to interact with all pixels, but how to change this rate to better accommodate different information densities?
- Change the external clock rate. (Not sure)
What is the "fill factor" in DVS?
Although qDVS can directly output event frames, as qDVS works at a very high clock rate according to the paper, each frame should not contain enough events for a prediction. For problems like human pose estimation, using these frames will definitely cause a severe missing torso problem, as humans don't move all his/her body parts all the time.
- Not clearly mentioned in the paper. This problem may not be solved in their paper originally.
- Maybe accumulation is also used?
Why the dynamic range is clearly lower (68dB) than other DVS cameras like Prophesee (124dB) and DAVIS346 (120dB)? Even the RGB camera has a higher dynamic range (FLIR BlackFly, 74.35dB). This means the qDVS camera doesn't have an advantage over other RGB cameras in low-lighting conditions.
- An inference is stated in Hardware Design Consideration Point 6.
Will the reset process eliminate the accumulated voltage change if this change is not big enough to trigger an event?

Points

This is a hardware paper regarding a novel query-driven DVS sensing approach (qDVS).
It aims to boost the achievable pixel density and energy efficiency.
Combines complementary advantages of RGB + DVS.
Spec: 256 x 256, 33% fill factor, 10% temporal contrast sensitivity.
Peak rate: 0.5mW, 1.2V, 80Meps, 6.3pJ/pixel-event.
Comparison between chips:

Traditional CMOS/APS: High pixel density, use frame scanning at a fixed clock rate, causes constant data rate and power independent of information content.
Typical DVS: Saves energy by more efficient visual event coding. Low pixel density, inefficient implementation, caused extra area and power overhead to continuously monitor for events and handle requests and acknowledge handshaking with each pixel. This leads to substantial static power.
qDVS: Uses a clocked time-division multiplexing to periodically scan the array, querying each pixel to check whether the brightness change has passed a threshold. This scanning time interval is way smaller than the APS rate.

gDVS achieves 2x greater pixel density, 20x greater energy efficiency than state-of-the-art.
qDVS is more suitable for deep learning since it directly outputs frames of events and does not require accumulation of events in the buffer memory, to reconstruct frames or dynamic clustering algorithms to identify object boundaries, in order to track them.

Hardware Design Consideration

In qDVS, the \(V_{IN}\) is defined by both \(C_{PH}\) and \(C_{REF}\), where \(\Delta V_{IN} = \frac{C_{PH}}{C_{PH}+C_{REF}}\cdot\Delta V_{PH} + \frac{C_{REF}}{C_{PH}+C_{REF}}\cdot\Delta V_{REF} \space\space\space\space\space (1)\).
If \(\Delta V_{IN}>+\epsilon\), output an ON event, if \(\Delta V_{IN}<-\epsilon\), output an OFF event.
The photodiode generates output \(V_{PH}\) in a logarithmic way to the brightness. This is the reason that there is a \(log\) applied to the brightness in DVS.
Capacitors work as differentiators, turning the variation of electric potential into electric potential current \(Q=It=C\Delta U\), \(I=C\frac{dU}{dt}\). Therefore, when the brightness on this photodiode doesn't change, no potential is generated after the capacitor \(C_{PH}\).
As the threshold for generating an ON/OFF event is fixed and based on \(\Delta V_{IN}\), while \(\Delta V_{IN}\) is decided by two factors \(\Delta V_{IN}\) and \(\Delta V_{REF}\) together, if we want to make the system easier to shoot events, then it's better to use a larger \(\Delta V_{REF}\). In this way, a smaller \(\Delta V_{IN}\) is able to trigger an event. Otherwise, if the \(\Delta V_{REF}\) is set smaller, a larger \(\Delta V_{IN}\) in required for shooting an event, which leads to a higher requirement of input brightness change.
As formula (1) shows, \(\Delta V_{PH}\) cannot contribute more than \(1\times \Delta V_{PH}\) as \(C_{REF}\) is not negative. There is no amplifier applied after \(V_{PH}\) to enlarge this value. This is called passive coupling, while in a regular DVS camera, an active amplifier is applied after the photodiode, making a relatively smaller change of potential becomes larger. This is why qDVS has a much smaller dynamic range.
Illumination intensity change sensitivity (Hardness to trigger an event) is determined by \(C_{REF}\), while the dynamic range (to which absolute brightness range events can still be effectively triggered) is determined by coupling amplification of input \(\Delta V_{PH}\).
As qDVS is a frame scan-based design, there is no need to report the row and column address for each event, which alleviates the energy consumption by acquiring these addresses, resulting in a higher information density with a fixed bus bandwidth.
The queries are processed in a row-parallel, column-serial scanned output pattern. (How does this row parallel happen? Not clear in the paper. My guess is that it is using clocked time division multiplexing technique here for rows.)
Column readout performs thresholding comparison of the pixel photodiode voltage, using bipolar voltage modulation of \(V_{REF}(V_{UP} \space and \space V_{DN})\), to detect ON and OFF temporal change events in intensity.
A Gm-boosted high-gain cascode amplifier (Gain > 90dB) provides a voltage clamp on the sense line to mitigate capacitive loading on the sense line and eliminate CV2 losses incurred in APS and DDS readout.
A dynamic comparator eliminates static power losses in event generation. This is the key reason that it is more energy efficient than regular DVS. (?What does dynamic comparator means? How different from the regular DVS?)