Thesis defense by Anass EL MOUDNI Monday, January 26 at 2 p.m., UFR ST du Madrillet
Date :
Abstract: Event-based cameras record asynchronous brightness changes instead of conventional frames at a fixed frequency. Their microsecond latency, negligible motion blur and dynamic range exceeding 120 dB, make them ideally suited for the rapid motion, high-contrast lighting and tight energy budget that characterise autonomous-vehicle perception. Yet the very sparsity and temporal precision of these sensors represent a major challenge for the standard computer vision pipelines, especially when dense, metrically-accurate depth is required for obstacle avoidance, planning and control. This thesis investigates event-based stereo depth estimation and makes three complementary contributions that span the spectrum from analytic geometry to data-driven learning and dataset creation. The first contribution of the thesis concerns the study of the geometry of stereo event-based cameras. In particular, we adapt the notion of DSI (Disparity Space Image) to embrace asynchronous streams of events from a stereo rig and fuse them across short time windows to build dense depth maps. Because classical DSIs assume a known camera motion, we introduce a self-consistent ego-motion estimator that aligns local time surfaces to provisional depth maps, closing the loop between the camera pose and depth at high frequencies on real driving sequences. While our geometric approach delivers satisfactory results, the performance gap with the existing deep learning methods remains large. In order to bridge this gap, as a second contribution of this thesis, a data-driven depth estimation method has been devised. More specifically, we design a spatio-temporal fusion module that learns to attend to the most informative past events and to fuse them with the current observations. This implicit motion compensation and context aggregation reduce end-point error achieves great performances in terms of depth estimation accuracy and computational complexity. Progress in deep learning ultimately depends on data availability. As our third contribution, we then introduce SPECTRA, a large-scale, multi-modal driving dataset which includes a stereo event camera synchronized with two RGB cameras, a LiDAR, semantic and instance masks, object-detection boxes and IMU measurements. Recorded under complex urban and suburban traffic conditions, our dataset supplies all the signals required for both supervised and self-supervised learning across a range of event-based perception tasks. When put together, the contributions of this thesis advance the state of the art in event-based depth estimation for autonomous vehicles, combining principled geometry with data-driven learning. This brings robust, low-latency 3D sensing one step closer to real-world deployment.