Awesome Integrated Perception and Prediction for Autonomous Driving

This is a collection of research papers about Integrated Perception and Prediction for Autonomous Driving (PnP4AD), and the repository will be continuously updated to track the frontier of this field.

👏Welcome to follow and star! If you find any related papers or reports could be helpful, feel free to submit an issue or PR.

Overview

PnP4AD refers to jointly handle perception (understanding the environment) and prediction (forecasting the future states of the environment) tasks within a unified framework or model with the input of multi-frame raw sensor data, instead of modularized and cascaded perception-then-prediction pipeline. We categorize existing works based on the perspective of prediction representations: objects/agents, instances/occupancy grids, and motions.

Papers

Object/Agent-Level

Camera Input

[arXiv 2022] Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction [paper]
[CVPR 2023] ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries [paper] [Github]
[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving [paper] [Github]

LiDAR Input

[CVPR 2018] Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [paper]
[CoRL 2018] IntentNet: Learning to Predict Intention from Raw Sensor Data [paper]
[CVPR 2019] End-to-End Interpretable Neural Motion Planner [paper]
[ICRA 2020] SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data [paper]
[CVPR 2020] PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [paper]
[CVPR 2020] STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction [paper]
[ECCV 2020] Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [paper]
[CoRL 2020] Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting [paper] [Github 👻]
[CVPR 2021] Deep Multi-Task Learning for Joint Localization, Perception, and Prediction [paper]
[ICCV 2021] LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [paper]
[CVPR 2022] Forecasting from LiDAR via Future Object Detection [paper] [Github]

Multi-Modal

[IROS 2020] End-to-end Contextual Perception and Prediction with Interaction Transformer [paper]

Instance/Occupancy-Level

Camera Input

[ICCV 2021] FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras [paper] [Github]
[ECCV 2022] StretchBEV: Stretching Future Instance Prediction Spatially and Temporally [paper] [Github]
[arXiv 2022] BEVerse: Unified Perception and Prediction in Birds- Eye-View for Vision-Centric Autonomous Driving [paper] [Github]
[ECCV 2022] ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning [paper] [Github]
[IJCAI 2023] PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird's-Eye View [paper] [Github]
[CVPR 2023] TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving [paper] [Github 👻]
[arXiv 2023] Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications [paper] [Github 👻]

LiDAR Input

[ECCV 2020] Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations [paper]
[CVPR 2021] MP3: A Unified Model to Map, Perceive, Predict and Plan [paper]
[ECCV 2022] Differentiable Raycasting for Self-Supervised Occupancy Forecasting [paper] [Github]
[CVPR 2023] Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting [paper] [Github]
[CVPR 2023] Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving [paper] [website]
[arXiv 2023] LiDAR-based 4D Occupancy Completion and Forecasting [paper] [Github]

Multi-Modal

[CVPRW 2020] FISHING Net: Future Inference of Semantic Heatmaps in Grids [paper] [talk]

Motion-Level

LiDAR Input

[CVPR 2020] MotionNet: Joint Perception and Motion Prediction for Autonomous Driving based on BEV Maps [paper] [Github]
[CVPR 2022] BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction with Bidirectional Enhancement [paper]
[CVPR 2023] Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving [paper] [Github]
[IJCAI 2023] ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds [paper]
[AAAI 2024] Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix [paper] [Github]
[CVPR 2024] Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations [paper] [Github]

Multi-Modal

[CVPR 2021] Self-Supervised Pillar Motion Learning for Autonomous Driving [paper] [Github]
[AAAI 2024] Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals [paper] [Github]

Others (agent-level + occupancy-level)

[IROS 2021] Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [paper]
[CVPR 2023] UniAD: Planning-oriented Autonomous Driving [paper] [Github]
[arXiv 2023] FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving [paper] [Github]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Integrated Perception and Prediction for Autonomous Driving

Overview

Papers

Object/Agent-Level

Camera Input

LiDAR Input

Multi-Modal

Instance/Occupancy-Level

Camera Input

LiDAR Input

Multi-Modal

Motion-Level

LiDAR Input

Multi-Modal

Others (agent-level + occupancy-level)

Talks

About

Releases

Packages

License

yahuiliu99/Awesome-PnP4AD

Folders and files

Latest commit

History

Repository files navigation

Awesome Integrated Perception and Prediction for Autonomous Driving

Overview

Papers

Object/Agent-Level

Camera Input

LiDAR Input

Multi-Modal

Instance/Occupancy-Level

Camera Input

LiDAR Input

Multi-Modal

Motion-Level

LiDAR Input

Multi-Modal

Others (agent-level + occupancy-level)

Talks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages