Skip to content

[Incl. GenAD work, CVPR 2024 Highlight] Embracing Foundation Models into Autonomous Agent and System

License

Notifications You must be signed in to change notification settings

turingmotors/DriveAGI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DriveAGI

This is "The One" project that OpenDriveLab is committed to contribute to the community, providing some thought and general picture of how to embrace foundation models into autonomous driving.

Table of Contents

NEWS

2024/03/24 OpenDV-YouTube Update: Full suite of toolkits for OpenDV-YouTube is now available, including data downloading and processing scripts, as well as language annotations. Please refer to OpenDV-YouTube.

2024/03/15 We released the complete video list of OpenDV-YouTube, a large-scale driving video dataset, for GenAD project. Data downloading and processing script, as well as language annotations, will be released next week. Stay tuned.

2024/01/24 We are excited to announce some update to our survey and would like to thank John Lambert, Klemens Esterle from the public community for their advice to improve the manuscript.

At A Glance

Here are some key components to construct a large foundation model curated for an autonomous system.

overview

Below we would like to share the latest update from our team on the DriveData side. We will release the detail of the DriveEngine and the DriveAGI in the future.

OpenDV-YouTube

opendv The largest Driving Video dataset to date, containing more than 1700 hours of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.

  • Complete Video list (under YouTube license): google sheet link
    • The downloaded raw videos (mostly 1080P) consumes about 3 TB storage space. However, these hour-long videos cannot be directly applied for model training as they are extremely memory consuming.
    • Therefore, we process them into conseductive images which are more flexible and efficient to load during training. Processed images consumes about 24 TB storage space in total.
  • Downloading and processing script: Please refer to OpenDV-YouTube.
  • Language annotation for OpenDV-YouTube: You can download full annotations for OpenDV-YouTube at OpenDV-YouTube-Language.

Quick facts:

  • Task: large-scale video prediction for driving scenes.
  • Data source: YouTube, with careful collection and filtering process.
  • Diversity Highlights: 1700 hours of driving videos, covering more than 244 cities in 40 countries.
  • Related work: GenAD Accepted at CVPR 2024
  • Note: Annotations for other public datasets in OpenDV-2K will not be released since we use and annotate randomly sampled data (video sequences) from these datasets, which are incomplete and hard to trace back to their origins (i.e., file name). Nevertheless, it's easy to reproduce the collection and annotation process on your own following our paper.

DriveData Survey

Abstract

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. In this survey, we provide a comprehensive analysis of more than 70 papers on the timeline, impact, challenges, and future trends in autonomous driving dataset.

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

@journal{li2024_driving_dataset_survey,
     title={Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future}, 
     author={Hongyang Li and Yang Li and Huijie Wang and Jia Zeng and Huilin Xu and Pinlong Cai
and Li Chen and Junchi Yan and Feng Xu and Lu Xiong and Jingdong Wang
and Futang Zhu and Kai Yan and Chunjing Xu and Tiancai Wang
and Fei Xia and Beipeng Mu and Zhihui Peng and Dahua Lin and Yu Qiao},
     year={2024},
     eprint={2312.03408},
     archivePrefix={arXiv},
     primaryClass={cs.CV}
}

overview

Current autonomous driving datasets can broadly be categorized into two generations since the 2010s. We define the Impact (y-axis) of a dataset based on sensor configuration, input modality, task category, data scale, ecosystem, etc.

overview

Related Work Collection

We present comprehensive paper collections, leaderboards, and challenges.(Click to expand)

Challenges and Leaderboards
Title Host Year Task Entry
Autonomous Driving Challenge OpenDriveLab CVPR2023 Perception / OpenLane Topology 111
Perception / Online HD Map Construction
Perception / 3D Occupancy Prediction
Prediction & Planning / nuPlan Planning
Waymo Open Dataset Challenges Waymo CVPR2023 Perception / 2D Video Panoptic Segmentation 35
Perception / Pose Estimation
Prediction / Motion Prediction
Prediction / Sim Agents
CVPR2022 Prediction / Motion Prediction 128
Prediction / Occupancy and Flow Prediction
Perception / 3D Semantic Segmentation
Perception / 3D Camera-only Detection
CVPR2021 Prediction / Motion Prediction 115
Prediction / Interaction Prediction
Perception / Real-time 3D Detection
Perception / Real-time 2D Detection
Argoverse Challenges Argoverse CVPR2023 Prediction / Multi-agent Forecasting 81
Perception & Prediction / Unified Sensorbased Detection, Tracking, and Forecasting
Perception / LiDAR Scene Flow
Prediction / 3D Occupancy Forecasting
CVPR2022 Perception / 3D Object Detection 81
Prediction / Motion Forecasting
Perception / Stereo Depth Estimation
CVPR2021 Perception / Stereo Depth Estimation 368
Prediction / Motion Forecasting
Perception / Streaming 2D Detection
CARLA Autonomous Driving Challenge CARLA Team, Intel 2023 Planning / CARLA AD Challenge 2.0 -
NeurIPS2022 Planning / CARLA AD Challenge 1.0 19
NeurIPS2021 Planning / CARLA AD Challenge 1.0 -
粤港澳大湾区 (黄埔)国际算法算例大赛 琶洲实验室 2023 感知 / 跨场景单目深度估计 -
感知 / 路侧毫米波雷达标定和目标跟踪 -
2022 感知 / 路侧三维感知算法 -
感知 / 街景图像店面招牌文字识别 -
AI Driving Olympics ETH Zurich, University of Montreal,Motional NeurIP2021 Perception / nuScenes Panoptic 11
ICRA2021 Perception / nuScenes Detection 456
Perception / nuScenes Tracking
Prediction / nuScenes Prediction
Perception / nuScenes LiDAR Segmentation
计图 (Jittor)人工智能算法挑战赛 国家自然科学基金委信息科学部 2021 感知 / 交通标志检测 37
KITTI Vision Benchmark Suite University of Tübingen 2012 Perception / Stereo, Flow, Scene Flow, Depth, Odometry, Object, Tracking, Road, Semantics 5,610

(back to top)

Perception Datasets
Dataset Year Diversity Sensor Annotation Paper
Scenes Hours Region Camera Lidar Other
KITTI 2012 50 6 EU Font-view GPS & IMU 2D BBox & 3D BBox Link
Cityscapes 2016 - - EU Font-view 2D Seg Link
Lost and Found 2016 112 - - Font-view 2D Seg Link
Mapillary 2016 - - Global Street-view 2D Seg Link
DDD17 2017 36 12 EU Front-view GPS & CAN-bus & Event Camera - Link
Apolloscape 2016 103 2.5 AS Front-view GPS & IMU 3D BBox & 2D Seg Link
BDD-X 2018 6984 77 NA Front-view Language Link
HDD 2018 - 104 NA Front-view GPS & IMU & CAN-bus 2D BBox Link
IDD 2018 182 - AS Front-view 2D Seg Link
SemanticKITTI 2019 50 6 EU 3D Seg Link
Woodscape 2019 - - Global 360° GPS & IMU & CAN-bus 3D BBox & 2D Seg Link
DrivingStereo 2019 42 - AS Front-view - Link
Brno-Urban 2019 67 10 EU Front-view GPS & IMU & Infrared Camera - Link
A*3D 2019 - 55 AS Front-view 3D BBox Link
Talk2Car 2019 850 283.3 NA Front-view Language & 3D BBox Link
Talk2Nav 2019 10714 - Sim 360° Language Link
PIE 2019 - 6 NA Front-view 2D BBox Link
UrbanLoco 2019 13 - AS & NA 360° IMU - Link
TITAN 2019 700 - AS Front-view 2D BBox Link
H3D 2019 160 0.77 NA Front-view GPS & IMU - Link
A2D2 2020 - 5.6 EU 360° GPS & IMU & CAN-bus 3D BBox & 2D Seg Link
CARRADA 2020 30 0.3 NA Front-view Radar 3D BBox Link
DAWN 2019 - - Global Front-view 2D BBox Link
4Seasons 2019 - - - Front-view GPS & IMU - Link
UNDD 2019 - - - Front-view 2D Seg Link
SemanticPOSS 2020 - - AS GPS & IMU 3D Seg Link
Toronto-3D 2020 4 - NA 3D Seg Link
ROAD 2021 22 - EU Front-view 2D BBox & Topology Link
Reasonable Crowd 2021 - - Sim Front-view Language Link
METEOR 2021 1250 20.9 AS Front-view GPS Language Link
PandaSet 2021 179 - NA 360° GPS & IMU 3D BBox Link
MUAD 2022 - - Sim 360° 2D Seg& 2D BBox Link
TAS-NIR 2022 - - - Front-view Infrared Camera 2D Seg Link
LiDAR-CS 2022 6 - Sim 3D BBox Link
WildDash 2022 - - - Front-view 2D Seg Link
OpenScene 2023 1000 5.5 AS & NA 360° 3D Occ Link
ZOD 2023 1473 8.2 EU 360° GPS & IMU & CAN-bus 3D BBox & 2D Seg Link
nuScenes 2019 1000 5.5 AS & NA 360° GPS & CAN-bus & Radar & HDMap 3D BBox & 3D Seg Link
Argoverse V1 2019 324k 320 NA 360° HDMap 3D BBox & 3D Seg Link
Waymo 2019 1000 6.4 NA 360° 2D BBox & 3D BBox Link
KITTI-360 2020 366 2.5 EU 360° 3D BBox & 3D Seg Link
ONCE 2021 - 144 AS 360° 3D BBox Link
nuPlan 2021 - 120 AS & NA 360° 3D BBox Link
Argoverse V2 2022 1000 4 NA 360° HDMap 3D BBox Link
DriveLM 2023 1000 5.5 AS & NA 360° Language Link

(back to top)

Mapping Datasets
Dataset Year Diversity Sensor Annotation Paper
Scenes Frames Camera Lidar Type Space Inst. Track
Caltech Lanes 2008 4 1224/1224 PV Link
VPG 2017 - 20K/20K PV - Link
TUsimple 2017 6.4K 6.4K/128K PV Link
CULane 2018 - 133K/133K PV - Link
ApolloScape 2018 235 115K/115K PV Link
LLAMAS 2019 14 79K/100K Front-view Image Laneline PV Link
3D Synthetic 2020 - 10K/10K PV - Link
CurveLanes 2020 - 150K/150K PV - Link
VIL-100 2021 100 10K/10K PV Link
OpenLane-V1 2022 1K 200K/200K 3D Link
ONCE-3DLane 2022 - 211K/211K 3D - Link
OpenLane-V2 2023 2K 72K/72K Multi-view Image Lane Centerline, Lane Segment 3D Link
Prediction and Planning Datasets
Subtask Input Output Evaluation Dataset
Motion Prediction Surrounding Traffic States Spatiotemporal Trajectories of Single/Multiple Vehicle(s) Displacement Error Argoverse
nuScenes
Waymo
Interaction
MONA
Trajectory Planning Motion States for Ego Vehicles, Scenario Cognition and Prediction Trajectories for Ego Vehicles Displacement Error, Safety, Compliance, Comfort nuPlan
CARLA
MetaDrive
Apollo
Path Planning Maps for Road Network Routes Connecting to Nodes and Links Efficiency, Energy Conservation OpenStreetMap
Transportation Networks
DTAlite
PeMS
New York City Taxi Data

Below we would like to share the latest update from our team on the DriveData side. We will release the detail of the DriveEngine and the DriveAGI in the future.

DriveLM

Introducing the First benchmark on Language Prompt for Driving.

Quick facts:

OpenScene

The Largest up-to-date 3D Occupancy Forecasting dataset for visual pre-training.

Quick facts:

OpenLane-V2 Update

Flourishing OpenLane-V2 with Standard Definition (SD) Map and Map Elements.

Quick facts:

About

[Incl. GenAD work, CVPR 2024 Highlight] Embracing Foundation Models into Autonomous Agent and System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%