Weather-Resilient Multi-Sensor Perception for Autonomous Driving

// Problem & Motivation

Why Weather Resilience Matters

Autonomous driving demands a perception system that works reliably in every condition — bright sunlight, heavy rain, fog, darkness, and cluttered urban environments. No single sensor modality is sufficient. Cameras deliver rich semantic and texture information but lack depth estimation at range and degrade in poor lighting. LiDAR provides precise 3D point clouds but becomes sparse at distance and struggles with adverse weather like rain or fog that scatters its laser pulses. Radar is robust to weather and measures velocity directly, but its angular resolution is too coarse for fine-grained object classification.

Existing approaches often process each sensor in isolation and attempt late fusion at the decision level, losing complementary information early in the pipeline. This project addresses that gap by designing a unified, BEV-centric perception architecture that fuses multi-modal features at the representation level, preserving geometric and semantic synergies across sensors. The result is a system that maintains high recall and precision across the full detection range and all environmental conditions, while meeting the strict latency budget required for real-time autonomous operation.

// End-to-End Lifecycle

Complete Perception Pipeline

From raw sensor streams to 3D tracked objects — hover each stage for details. The pipeline processes three modalities in parallel, fuses them in BEV space, and applies temporal reasoning for coherent tracking.

Sensor Inputs → Feature Extraction → Weather-Adaptive Fusion → Detection → Edge Deployment

Camera Stream

📷

RGB Frames 1920×1080

›

🧠

CNN Backbone ResNet-50

LiDAR Stream

☁️

Point Cloud 300K pts/frame

›

🔬

PointPillars Voxelize

Radar Stream

📡

Radar Signals 77 GHz FMCW

›

⚡

Signal Proc Range-Doppler

Cross-Modal BEV Fusion

🔄

BEV Projection Unified Grid

›

⚖️

Attention Fusion Cross-Modal

›

📊

Fused BEV 256×256

Detection & Tracking

🚗

3D Detection CenterPoint

›

🚶

Multi-Object Tracking Hungarian

›

📍

Trajectory Prediction 3s Horizon

Deployment

⚡

TensorRT INT8 Quant

›

🚘

AV Runtime <50ms

// Methodology

Pipeline Steps

The perception pipeline follows a six-stage process, from raw sensor calibration through to final 3D bounding box prediction, with each stage designed for parallelism and low latency.

Sensor Calibration

Extrinsic and intrinsic calibration of camera, LiDAR, and radar using a joint optimization over calibration targets. Temporal synchronization via hardware PTP timestamps to align frames within 2ms.

Feature Extraction

Camera images processed through ResNet-50 with Feature Pyramid Network. LiDAR point clouds encoded via PointPillars into pseudo-images. Radar signals processed through CFAR detection and a lightweight MLP backbone.

BEV Projection

Camera features lifted to 3D using learned depth estimation and projected onto a unified BEV grid. LiDAR pillars naturally map to BEV. Radar detections scatter into the same grid with velocity channels.

Cross-Modal Attention

A deformable cross-attention module aligns and fuses BEV features from all three modalities. Learned query positions attend to relevant spatial locations across sensor maps, resolving geometric misalignments.

Temporal Self-Attention

Fused BEV features from the current frame attend to warped features from previous frames using ego-motion compensation. This temporal module stabilizes detections across frames and enables velocity estimation.

3D Detection

CenterPoint-style detection heads regress 3D bounding box centers, dimensions, orientation, and velocity from the temporally fused BEV map. Non-maximum suppression produces the final detection output.

// Performance Analysis

Interactive Charts

Quantitative evaluation across detection range, weather conditions, inference latency, and overall system capabilities. All benchmarks measured on the nuScenes validation set and a proprietary adverse-weather test suite.

Recall vs Distance Across Conditions

Recall at IoU=0.5 under clear and adverse weather

Line Chart

Scroll to zoom · Drag to pan

Weather Robustness by Sensor Stack

mAP comparison across clear, rain, fog, and night

Bar Chart

Scroll to zoom · Drag to pan

Inference Latency Under Sensor Load

End-to-end latency on NVIDIA Orin across traffic density

Line Chart

Scroll to zoom · Drag to pan

Autonomy Stack Capabilities

Multi-dimensional comparison vs camera-only baseline

Radar Chart

Scroll to zoom · Drag to pan

// Results

Key Outcomes

The fused system was evaluated on the nuScenes benchmark and a proprietary adverse-weather test suite. All metrics are reported relative to the strongest single-sensor baseline (LiDAR-only CenterPoint).

+24%

Object Recall

Improvement over single-sensor baselines at IoU=0.5, especially at long range (>60m) and under occlusion.

<55ms

Latency

End-to-end inference on NVIDIA Orin with INT8 quantization and TensorRT, meeting the 20 Hz real-time budget.

-31%

Runtime Overhead

Compared to running three separate detection models, the unified pipeline reduces total compute by 31%.

4

Sensor Modalities

Camera, LiDAR, and radar fused at the feature level in a shared BEV representation for complementary perception.

// Key Takeaways

Design Highlights

BEV-Centric Fusion

Projecting all sensor features into a unified Bird's-Eye-View space eliminates the geometric ambiguity of perspective fusion and enables straightforward 3D reasoning. The shared BEV grid serves as the backbone for both spatial and temporal aggregation.

Temporal Self-Attention

By attending to ego-motion-warped BEV features from prior frames, the model stabilizes detections across time, reduces false positives from single-frame noise, and enables implicit velocity estimation without explicit tracking modules.

Edge Deployment Ready

INT8 quantization via TensorRT, operator fusion, and optimized memory layout bring inference below 50ms on NVIDIA Orin. The pipeline is containerized with NVIDIA Triton for production-grade serving on the vehicle compute platform.

// Tech Stack

Technologies Used

Frameworks & Tools

PyTorch TensorRT NVIDIA Orin CUDA OpenCV Open3D nuScenes SDK PointPillars CenterPoint Deformable DETR ONNX Docker NVIDIA Triton ROS2 Python C++

// Client Fit

Business Impact and Delivery Scope

Problem Solved

Autonomy perception stacks often regress in rain, fog, and low-light conditions where safety margins matter most.

What I Deliver

Weather-aware fusion architecture with condition-adaptive weighting and robust performance validation.

Expected Impact

More stable detection quality across degraded environments without unacceptable latency overhead.

// Work With Me

Hire Me for Weather-Robust Perception

I can help teams improve perception reliability under adverse environmental conditions for safety-critical systems.

MVP Delivery

Condition-aware benchmark setup and baseline model tuned for your critical scenarios.

Production Hardening

Domain adaptation, calibration checks, and failure-case mitigation under weather shift.

Advisory + Build

Architecture and evaluation strategy support for reliability-focused AV teams.

Start Project Inquiry

Weather-Resilient Multi-Sensor Perception for Autonomous Driving

Why Weather Resilience Matters

Complete Perception Pipeline

Camera Input

Visual Feature Extractor

LiDAR Input

3D Feature Encoder

Radar Input

Radar Processing

Bird’s Eye View Projection

Attention-Based Fusion

Fused Feature Map

3D Object Detection

Real-Time AV Inference

Pipeline Steps

Sensor Calibration

Feature Extraction

BEV Projection

Cross-Modal Attention

Temporal Self-Attention

3D Detection

Interactive Charts

Recall vs Distance Across Conditions

Weather Robustness by Sensor Stack

Inference Latency Under Sensor Load

Autonomy Stack Capabilities

Key Outcomes

Design Highlights

BEV-Centric Fusion

Temporal Self-Attention

Edge Deployment Ready

Technologies Used

Business Impact and Delivery Scope

Problem Solved

What I Deliver

Expected Impact

Hire Me for Weather-Robust Perception

MVP Delivery

Production Hardening

Advisory + Build

Other Projects

Instruction-Tuned Multimodal LLM for Scene Understanding

Knowledge-Augmented Reasoning Engine via Fine-Tuned LLM

Enhancing Math Reasoning in LLMs via Self-Supervised Fine-Tuning

Multimodal Emotion Recognition for Human-Robot Interaction

Audio-Visual Fusion for Dynamic Pedestrian Awareness