Navigating the Physical World: A Survey of Embodied Navigation

This is a repository of embodied navigation survey led by MSP group from Shanghai Jiao Tong University.

In this repository, you can learn the concept of embodied navigation and find the state-of-the-arts works related to this topic.

1. Embodied Navigation Paradigm and Elements

1.1 Definition

The Embodied Navigation System is a fully autonomous navigation system with Interactive Perception, Neuromorphic Cognition, and Motion Execution capabilities. image

1.2 Embodied Navigation Paradigm

Given an intelligent agent with a certain degree of freedom, which follows specific motion rules and has hardware parameters, equipped with sensors capable of acquiring extensive environmental observations, we aim to establish a differentiable objective function. This function is designed for joint optimization of the state space and motion space, providing outputs for environment state, motion execution, and agent state. image

The primary characteristic of this paradigm lies in the joint optimization and solving of the agent's state, environmental state, and agent motion. In contrast to prior navigation methods that may have focused solely on optimizing the agent's state, or SLAM problems which crucially consider the agent and environment, embodied navigation additionally optimizes its own motion execution.

To better describe the problem setup of embodied navigation, we use the random process for modeling of the framework. image

To better understand the modules in the framework, we define several levels of each module. The darker colors mean more advanced levels. image

1.3 Key Elements of Embodied Navigation

Embodied Navigation Traditional Navigation
Ego-centric Global Axis
Multi Nodes, n-DoF Single Node, <=6DoF
Evolved Motion Skills Fixed Movement
Autonomous Task Decomposition and Multi-Task Joint Optimization Manual Task Decomposition for Individual Optimization
First Principles Engineering-Oriented Approach
Weak Metricity Precise Metricity
Active Interaction Between Agent and Environment Passive Perception

2. Interactive Perception

Surrunding Environment -Task

Object Detection

Algorithm Modality Object Type Date Publication Paper Link Code
DCGNN LiDAR Single-state 3D object 2023 CAIS Link ---
ContrastZSD CAM Zero-shot Object 2024 IEEE TPAMI Link ---
Gfocal CAM Dense object 2023 IEEE TPAMI Link ---
DeepGCNs Point cloud --- 2023 IEEE TPAMI Link code
GCNet --- --- 2023 IEEE TPAMI Link code
CNN hybrid module(CSWin+hybrid patch embedding module+slicing-based inference) RGB image objects in UAV images 2023 J-STARS link
iS-YOLOv5 RGB image small objects in autonomous driving 2023 Pattern Recognition Letters link
ASIF-Net RGB-D Salient Object 2021 IEEE T Cybernetics link code
AdaDet(based on Early-Exit Neural Networks) RGB image ... 2024 IEEE T COGN DEV SYST link
memory network+causal intervention+Mask RCNN RGB/grayscale image object in different weather condition 2024 IEEE TPAMI link
Res2Net RGB image object on 2D frames, especially Salient Object 2021 IEEE TPAMI link code

Place Recognization

Algorithm Modality Date Publication Paper Link Code
R2former Cam 2023 CVPR Link Code
Eigenplaces Cam 2023 ICCV Link Code
Anyloc Cam 2023 RAL Link Code
Optimal transport aggregation for visual place recognition Cam 2023 ArXiv Link Code
Seqot LiDAR 2022 TIE Link Code
Lpd-net LiDAR 2019 ICCV Link
Bevplace LiDAR 2023 ICCV Link Code
Adafusion Cam-LiDAR 2022 RAL Link
Mff-pr Cam-LiDAR 2022 ISMAR Link
Lcpr Cam-LiDAR 2023 RAL Link
Explicit Interaction for Fusion-Based Place Recognition Cam-LiDAR 2024 ArXiv Link

Semantic Classification

Algorithm Modality Semantic Type Date Publication Link
Reinforcement Learning with Phase Transition Mechanism Visual Object Recognition and Goal Navigation 2023 arXiv arXiv
Active Neural SLAM with Semantic Segmentation Visual Object Classification and Goal Localization 2022 NeurIPS NeurIPS Proceedings
Reinforcement Learning with Communication and Feature Fusion Modules Visual and Semantic Maps Object and Scene Understanding 2021 arXiv arXiv
Multitask Learning with Attentive Architecture Visual, Audio, and Text Multi-Modal Object and Scene Classification 2022 NeurIPS NeurIPS Proceedings
Self-supervised Learning with Multi-Head Attention Visual and Language 3D Object Recognition and Language Understanding 2022 arXiv arXiv
Deep Reinforcement Learning Visual Scene and Object Classification 2021 CVPR CVPR 2021
Curriculum Learning Visual Object and Scene Recognition 2020 ICLR ICLR 2020
Vision-Language Models Visual and Language Object Detection and Language Understanding 2023 arXiv arXiv
Semantic Mapping and Coordination Visual and Semantic Maps Object and Scene Classification 2022 IEEE Robotics and Automation Letters IEEE Xplore
Scene Priors with Reinforcement Learning Visual Scene and Object Classification 2021 ICCV ICCV 2021

Ego State -Body

Wheeled Vehicle

Algorithm Modality Date Publication Paper Link Code
Doppler-only Single-scan 3D Vehicle Odometry Radar 2023 ArXiv Link
PhaRaO Radar 2020 ICRA Link
RadarSLAM Radar 2020 IROS Link
4DRadarSLAM Radar 2023 ICRA Link Code
LIC-Fusion Lidar-IMU-Cam 2019 IROS Link
LIC-Fusion 2.0 Lidar-IMU-Cam 2020 IROS Link
Faster-LIO Lidar-IMU 2022 RAL Link Code
LOAM Lidar 2014 RSS Link
LeGO-LOAM Lidar 2018 IROS Link


Algorithm Modality Date Publication Paper Link Code
Fast-LIO Lidar-IMU 2021 RAL link code
Swarm-LIO LiDar-IMU 2023 ICRA link code
Vision-UWB fusion framework UGV-assisted 2023 IEEE ICIEA link
EKF+ IGG robust estimation GNSS+INS+IMU+Force sensor 2024 MEAS SCI TECHNOL link code
Omni-Swarm(multidrone map-based localization+visual drone tracking) VIO+UWB sensors+stereo wide-field-of-view cameras 2022 IEEE T ROBOT link code
EKF ToF+? 2024 AMC link
HDVIO VIO+IMU+dynamics module 2023 RSS link
Acoustic Inertial Measurement(AIM) Acoustics(microphone array) 2022 ACM(SenSys) link

Legged Robot

Algorithm Modality Date Publication Paper Link Code
Direct LiDAR Odometry LiDAR 2022 RAL Link
MIPO IMU-Kinematics 2023 IROS Link Code
Robust Legged Robot State Estimation Using Factor Graph Optimization Cam-IMU-Kinematics 2019 RAL Link
VILENS Cam-IMU-Lidar-Kinematics 2023 TRO Link
Invariant Smoother for Legged Robot State Estimation IMU-Kinematics 2023 TRO Link
Cerberus Cam-IMU-Kinematics 2023 ICRA Link Code
On State Estimation for Legged Locomotion Over Soft Terrain IMU-Kinematics 2021 IEEE Sensors Letters Link
Event Camera-based Visual Odometry Event Cam-RGBD 2023 IROS Link
Pronto Cam-IMU-Lidar-Kinematics 2020 Frontiers in Robotics and AI Link Code
Legged Robot State Estimation With Dynamic Contact Event Information IMU-Kinematics 2021 RAL Link
Vision-Assisted Localization and Terrain Reconstruction with Quadruped Robots Depth-IMU-Lidar 2022 IROS Link


Algorithm Modality Semantic Type Date Publication Link
Social Dynamics Adaptation (SDA) Depth images, ResNet, Recurrent Policy Network Human Trajectories, Motion Policy 2024 arXiv Link
SMPL Body Model, Motion Retargeting Motion Capture, SMPL Parameters Human Motion, Humanoid Motion Imitation 2024 arXiv Link
Humanoid Shadowing Transformer, Imitation Transformer Optical Marker-based Motion Capture, RGB Camera Human Body and Hand Data, Pose Estimation 2024 arXiv Link
Remote Teleoperation Architecture Fiber Optic Network, Virtual Reality Equipment Teleoperation, Human-Robot Interaction 2022 arXiv Link
POMDP, Reinforcement Learning Motion Capture, Force-Controlled Actuators Human Motion, Robot Locomotion 2024 arXiv Link
Modular Learning Framework, Imitation Learning Motion Capture, Human Demonstrations Humanoid Behaviors, Task Learning 2021 IEEE Robotics and Automation Letters Link
Zero-Shot Learning with CLIP Embeddings RGB-D Camera Object Navigation 2022 CVPR Link
Reinforcement Learning Visual Inputs (RGB Camera) Open-World Navigation 2021 ICRA Link
Reinforcement Learning with Gesture Recognition Multimodal (Gestures, Visual Inputs) Human-Robot Interaction 2023 CVPR Link
Vision-Language Model, Self-Supervised Learning Visual and Language Inputs Instruction Following 2020 CVPR Link
Simulation-Based Learning Visual and Physical Simulation Physical Interaction Prediction 2020 CVPR Link

Collabrative Sensing -View


Algorithm Modality Date Publication Paper Link Code
Graph‐based subterranean exploration path planning using aerial and legged robots Cam-Depth-Lidar-Thermal-IMU 2020 Journal of Field Robotics Link
Stronger Together Cam-Lidar-GNSS 2022 RAL Link Code
VIO-UWB-Based Collaborative Localization and Dense Scene Reconstruction within Heterogeneous Multi-Robot Systems Depth-Lidar 2022 ICARM Link
Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge Cam-Lidar 2022 Field Robotics Link
Aerial-Ground Collaborative Continuous Risk Mapping for Autonomous Driving of Unmanned Ground Vehicle in Off-Road Environments Depth-Lidar-IMU 2023 TAES Link Code
Cooperative Route Planning for Fuel-constrained UGV-UAV Exploration Cam-Lidar-GNSS 2022 ICUS Link
Energy-Efficient Ground Traversability Mapping Based on UAV-UGV Collaborative System Cam-Lidar 2022 TGCN Link
Aerial-Ground Robots Collaborative 3D Mapping in GNSS-Denied Environments Cam-Lidar 2022 ICRA Link
Autonomous Exploration and Mapping System Using Heterogeneous UAVs and UGVs in GPS-Denied Environments Cam-Depth-Lidar 2019 TVT Link


Algorithm Modality Date Publication Paper Link Code
Joint Optimization of UAV Deployment and Directional Antenna Orientation 2023 WCNC Link
Multi-UAV Collaborative Sensing and Communication: Joint Task Allocation and Power Optimization 2023 TWC Link
Decentralized Multi-UAV Cooperative Exploration Using Dynamic Centroid-Based Area Partition Depth 2023 Drones Link
Cooperative 3D Exploration and Mapping using Distributed Multi-Robot Teams Lidar 2024 ICARSC Link
RACER Cam-IMU 2023 TRO Link Code
Fast Multi-UAV Decentralized Exploration of Forests Depth 2023 RAL Link Code
Next-Best-View planning for surface reconstruction of large-scale 3D environments with multiple UAVs Depth 2020 IROS Link
An autonomous unmanned aerial vehicle system for fast exploration of large complex indoor environments Cam-Lidar 2021 Journal of Field Robotics Link Code
Multi‑MAV Autonomous Full Coverage Search in Cluttered Forest Environments Cam-Lidar 2022 Journal of Intelligent & Robotic Systems Link


Algorithm Modality Date Publication Paper Link Code
Hybrid Stochastic Exploration Using Grey Wolf Optimizer and Coordinated Multi-Robot Exploration Algorithms 2019 IEEE Access Link
MR-TopoMap Cam-Lidar 2022 RAL Link
H2GNN 2022 RAL Link
SMMR-Explore Lidar-IMU 2021 ICRA Link
Distributed multi-robot potential-field-based exploration with submap-based mapping and noise-augmented strategy Lidar 2024 Robotics and Autonomous Systems Link
CoPeD-Advancing Multi-Robot Collaborative Perception Cam-Lidar-GNSS-IMU 2024 RAL Link
Collaborative Complete Coverage and Path Planning for Multi-Robot Exploration 2021 Sensors Link
Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning Lidar 2020 TVT Link
Multi-vehicle cooperative localization and mapping with the assistance of dynamic beacons in GNSS-denied environment IMU-Lidar 2024 ISAS Link

3. Advanced Cognition

Global/Local Space -Representation


Algorithm Based Structure Date Publication Paper Link Code
Point Cloud Library (PCL) Point cloud 2011 ICRA link
PointNet Point cloud 2017 CVPR link
PCT Point cloud 2021 computational visual media link
TEASER Point cloud 2021 IEEE T ROBOT link
SC-CNN point cloud+hierarchical+anisotropic spatial geometry 2022 TRGS link code
PMP-Net++ Point cloud 2023 IEEE TPAMI link code
STORM Point cloud 2023 IEEE TPAMI link
Registration by Graph Matching deep graph+point cloud 2023 IEEE TPAMI link code
CrossNet RGB+grayscale+point cloud 2024 TMM link
PointConT point content-based Transformer 2024 JAS link code


Algorithm Based Structure Date Publication Paper Link Code
Direct Voxel Grid Optimization voxel grid 2022 CVPR link
NICE-SLAM multireslutional voxel grid 2022 CVPR link
Instant neural graphics primitives with a multiresolution hash encoding voxel grid hash encoding 2022 ACM Transactions on Graphics link
Vox-Fusion voxel grid with octree 2022 ISMAR link
Occupancy Networks occupancy grid 2019 CVPR link

Neural Weights

Algorithm Based Structure Date Publication Paper Link Code
NeRF MLP 2022 ACM Transactions on Graphics link
3D-GS 3D-GS 2023 ACM Transactions on Graphics link
NerF-LOAM Neural-SDF 2023 ICCV link

Ego Motion -Semantic

Action Recognization

Algorithm Modality Date Publication Paper Link Code
Egocentric Action Recognition by Automatic Relation Modeling Egocentric RGB Videos 2023 TPAMI link
Egocentric Human Activities Recognition With Multimodal Interaction Sensing Egocentric RGB Videos+IMU 2024 IEEE Sensors Journal link
Ego-Humans Egocentric RGB Videos 2023 ICCV link
E2(GO)MOTION Egocentric Event Stream Videos 2022 CVPR link
Towards Continual Egocentric Activity Recognition: A Multi-Modal Egocentric Activity Dataset for Continual Learning Egocentric RGB Videos+IMU 2024 IEEE Transactions on Multimedia link
MARS IMU 2021 IEEE Internet of Things Journal link
Multi-level Contrast Network for Wearables-based Joint Activity Segmentation and Recognition IMU 2022 Globecom link
Timestamp-Supervised Wearable-Based Activity Segmentation and Recognition With Contrastive Learning and Order-Preserving Optimal Transport IMU 2024 TMC link

Motion to Languange

Algorithm Modality Date Publication Paper Link Code
MotionGPT IMU 2023 NIPS link
IMUGPT 2.0 IMU 2024 ArXiv link
APSR framework depth/3D joint information/RGB frame/IR sequence 2020 IEEE TPAMI link
MS block+Res2Net 2D RGB images 2023 IEEE TCSVT link
EM+Dijkstra IMU 2020 IEEE T HUM-MACH SYST link
MotionLLM 2024 arxiv link code
Seq2Seq+SeqGAN+RL+MC CAM+Master Motor Map framework 2021 ICRA link
KIT (a datasheet) CAM+Master Motor Map framework 2017 Big Data link datasheet
Motion Patches+ViT framework 3D joint position+RGB 2D images 2024 arxiv link

Goal Understanding -Type



No. Algorithm Modality Semantic Type Date Publication Link
1 Deep Learning Visual Image Segmentation 02/2022 IEEE Transactions on Intelligent Transportation Systems Link
2 CNNs Visual Image Understanding 06/2021 Neural Networks Link
3 Semantic Localization and Mapping Visual Image Recognition 03/2023 Robotics and Autonomous Systems Link
4 Vision-Based Learning Visual Image Recognition 05/2023 International Journal of Robotics Research Link
5 Deep Learning Visual Image Analysis 08/2021 Pattern Recognition Letters Link
6 Integrated Semantic Mapping Visual Image Recognition 04/2022 Robotics Link
7 Deep Learning Visual Image Segmentation 02/2023 Journal of Field Robotics Link
8 Advanced Semantic Analysis Visual Image Understanding 06/2023 Autonomous Robots Link


No. Algorithm Modality Semantic Type Date Publication Link
1 PPO Visual Object Recognition 09/2021 arXiv Link
2 XgX Visual Object Detection 11/2023 arXiv Link
3 Deep RL Visual Object Recognition 01/2022 Journal of Intelligent & Robotic Systems Link
4 Cross-Modal Learning Visual/Textual Object Detection 04/2022 IEEE Robotics and Automation Letters Link
5 Goal-Oriented Exploration Visual Object Detection 03/2021 CVPR Link
6 Deep RL Visual Object Segmentation 07/2022 Sensors Link
7 Multi-Task Learning Visual Object Localization 12/2021 IEEE Transactions on Neural Networks and Learning Systems Link
8 DCNNs Visual Scene Understanding 10/2022 Pattern Recognition Link
9 Spatial Attention Mechanism Visual Object Detection 06/2021 Robotics and Autonomous Systems Link
10 Real-Time Semantic Mapping Visual Object Recognition 05/2023 International Journal of Advanced Robotic Systems Link

4. Motion Execution

Skills -


Algorithm Date Publication Paper Link Code
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection 2018 IJRR Link
Learning ambidextrous robot grasping policies 2019 Science Robotics Link
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning 2018 IROS Link
GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping 2020 CVPR Link Code
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains 2023 TRO Link
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation 2020 CVPR Link
UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers 2024 arXiv Link Code
DrEureka: Language Model Guided Sim-To-Real Transfer 2024 RSS Link Code
Humanoid Locomotion as Next Token Prediction 2024 arXiv Link


Algorithm Date Publication Paper Link Code
Learning compositional models of robot skills for task and motion planning 2021 ISRR Link Code
Learning Manipulation Skills via Hierarchical Spatial Attention 2022 TRO Link
Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models 2024 ICRA Link Code
SAGCI-System 2022 ICRA Link
Pedipulate: Enabling Manipulation Skills using a Quadruped Robot’s Leg 2024 ICRA Link
PhyPlan 2024 arxiv Link Code
Practice Makes Perfect: Planning to Learn Skill Parameter Policies 2024 RSS Link Code
Extreme Parkour with Legged Robots 2024 ICRA Link Code
WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts 2024 arXiv Link
HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation 2024 IROS Link
Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning 2023 RSS Link
Real-World Humanoid Locomotion with Reinforcement Learning 2024 Science Robotics Link


Algorithm Modality DoF Date Publication Paper Link Code
iPlanner Depth 2-D 2023 RSS Link
ViPlanner RGB-D 2-D 2024 ICRA Link
DTC: Deep Tracking Control Depth 1/2-D 2024 Science Robotics Link
Neural RRT* RGB 2-D 2020 IEEE Transactions on Automation Science and Engineering Link
Socially aware motion planning with deep reinforcement learning Stereo RGB 2-D 2017 IROS Link
Efficient Autonomous Exploration Planning of Large-Scale 3-D Environments RGB 3-D 2019 RAL Link
ArtPlanner: Robust Legged Robot Navigation in the Field RGB-D 2.5-D 2021 Journal of Field Robotics Link Code
Perceptive Whole Body Planning for Multi-legged Robots in Confined Spaces RGB-D 3-D 2021 Journal of Field Robotics Link
Versatile Multi-Contact Planning and Control for Legged Loco-Manipulation RGB-D 3-D 2023 Science Robotics Link
Learning to walk in confined spaces using 3D representation RGB-D/LiDAR 3-D 2024 ICRA Link Code
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation RGB-D 2-D 2024 ICRA Link Code
Autonomous Navigation of Underactuated Bipedal Robots in Height-Constrained Environments. RGB-D 3-D 2023 IJRR Link

Morphological Collabration -Morphologic

Algorithm Morphologic Date Publication Paper Link Code
Learning Robust Autonomous Navigation and Locomotion for Wheeled-legged Robots Wheel-leg 2024 Science Robotics Link
SytaB Ground-Air 2022 RAL Link
Aerial-aquatic robots capable of crossing the air-water boundary and hitchhiking on surfaces ground-air-water 2022 Science Robotics Link
Advanced Skills through Multiple Adversarial Motion Priors in
Reinforcement Learning Wheel-leg 2023 ICRA Link
Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks Wheel-leg 2023 PMLR Link
Offline motion libraries and online MPC for advanced mobility skills Wheel-leg 2022 IJRR Link
Whole-body mpc and online gait sequence generation for wheeled-legged robots Wheel-leg 2021 IROS Link
Skywalker Ground-Air 2023 RAL
Autonomous and Adaptive Navigation for Terrestrial-Aerial Bimodal Vehicles Ground-Air 2022 RAL Link
ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots Quadrupedal 2024 ICRA Link
Body Transformer: Leveraging Robot Embodiment for Policy Learning Legged 2024 arXiv Link
Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors Legged 2024 arXiv Link

5. Platforms and Data


Algorithm Date Publication Paper Link Code
RFUniverse: A Multiphysics Simulation Platform for Embodied AI 2023 arxiv Link Code


6. Open Research Problems

Adaptive Scale and Complex Environment

Joint Optimization


6. Conclusions

