Skip to content

Neural network methods for multimodal map reconstruction and their usage for robot navigation and control

License

Notifications You must be signed in to change notification settings

yuddim/awesome-3d-multimodal-maps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-3D-Multimodal-Maps

Here we demonstrate neural network methods for 3D multimodal map reconstruction and their usage for object retrieval, robot navigation and control.

Example of a 3D multimodal map with a scene graph representation. We can make text queries to it for 3D object retrieval:

Taxonomy

We propose following taxonomy of 3D multimodal map reconstruction methods:

Taxonomy

Dense multimodal map reconstruction methods

Point-based

(2023) OpenMask3D: Open-Vocabulary 3D Instance Segmentation. arXiv preprint arXiv:2306.13631. NeurIPS 2023. Takmaz, A., Fedele, E., Sumner, R. W., Pollefeys, M., Tombari, F., & Engelmann, F. (ETH Zurich, ETH AI Center, Google Zurich ) (paper) | (code) | (project)

(2023) Openscene: 3d scene understanding with open vocabularies. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. Peng, Songyou, et al. (Google Research, ETH Zurich, MPI for Intelligent Systems, Waymo LLC, Simon Fraser University) (paper) | (code) | (project) | (video)

(2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. Ding, Runyu, et al. (The University of Hong Kong, ByteDance) (paper) | (code) | (project)

(2023) Audio visual language maps for robot navigation. arXiv preprint arXiv:2303.07522, 2023. Huang, Chenguang, et al. (Freiburg University, Google Research, University of Technology Nuremberg). ISER 2023 (paper) | (code) | (project) | (colab) | (video)

(2023) Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241, 2023. Jatavallabhula, Krishna Murthy, et al. (MIT, Universite de Montreal, University of Toronto, IIIT Hyderabad, CMU, Amazon, Matician, DEVCOM Army Research Laboratory) (paper) | (code) | (project) | (video)

(2022) Language-grounded indoor 3d semantic segmentation in the wild. In ECCV (pp. 125-141). Rozenberszki, D., Litany, O., & Dai, A. (Technical University of Munich, NVIDIA). (paper) | (code) | (project) | (benchmark) | (video)

Voxel-based

(2024) Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning. arXiv preprint arXiv:2403.11401. Fu, R., Liu, J., Chen, X., Nie, Y., & Xiong, W. (Brown University, ETH Zurich, Meta AI) (paper) | (code) | (project) | (colab) | (video)

(2023) Visual language maps for robot navigation. 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023. Huang, Chenguang, et al. (Freiburg University, Google Research, University of Technology Nuremberg) (paper) | (code) | (project) | (colab) | (video)

NeRF-based

(2024) Llm-grounder: Open-vocabulary 3d visual grounding with large language model as an agent. arXiv preprint arXiv:2309.12311. ICRA 2024. Yang, J., Chen, X., Qian, S., Madaan, N., Iyengar, M., Fouhey, D. F., & Chai, J. (University of Michigan, New York University) (paper) | (code) | (project) | (live demo) | (video)

(2023). 3d-llm: Injecting the 3d world into large language models. Advances in Neural Information Processing Systems, 36, 20482-20494. Hong, Y., Zhen, H., Chen, P., Zheng, S., Du, Y., Chen, Z., & Gan, C. (UCLA, SJTU, SCUT, UIUC, MIT, MIT-IBM Watson AI Lab, Umass Amherst) (paper) | (code) | (project)

(2023) Lerf: Language embedded radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 19729-19739). Kerr, J., Kim, C. M., Goldberg, K., Kanazawa, A., & Tancik, M. (UC Berkeley)(paper) | (code) | (project) | (dataset)

(2023) Weakly Supervised 3D Open-vocabulary Segmentation. arXiv preprint arXiv:2305.14093. NeurIPS 2023. Liu, K., Zhan, F., Zhang, J., Xu, M., Yu, Y., Saddik, A. E., ... & Lu, S. (Nanyang Technological University, Max Planck Institute for Informatics, University of Ottawa, Carnegie Mellon University, MBZUAI) (paper) | (code) | (dataset)

SDF-based

(2024) Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation. arXiv preprint arXiv:2310.03923. ICRA 2024. Yamazaki, Kashu, et al. (University of Arkansas, West Virginia University, University of Liverpool) (paper) | (code) | (project)

Gaussian Splatting-based

(2024) OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding. arXiv preprint arXiv:2406.02058. Wu, Y., Meng, J., Li, H., Wu, C., Shi, Y., Cheng, X., Zhao, C., Feng, H., Ding, E., Wang, J. and Zhang, J. (Peking University, Baidu VIS, Beihang University) (paper) | code (Coming soon) | (project)

(2023) LangSplat: 3D Language Gaussian Splatting. arXiv preprint arXiv:2312.16084. CVPR2024 Highlight. Qin, M., Li, W., Zhou, J., Wang, H. and Pfister, H. (Tsinghua University, Harvard University) (paper) | (code) | (project) | (video)

Sparse multimodal map reconstruction methods

Object-based

(2024) Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models. arXiv preprint arXiv:2405.02162. Mdfaa, M.A., Salameh, R., Zagoruyko, S. and Ferrer, G. (SkolTech) (paper) | (code is coming soon) | (project)

(2024) SUGAR: Pre-training 3D Visual Representations for Robotics. arXiv preprint arXiv:2404.01491. S Chen, R Garcia, I Laptev, C Schmid (paper) | (code is coming soon) | (project)

(2023) Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. Najibi, Mahyar, et al. (Waymo LLC) (paper)

Object-based with relations (knowledge graphs)

(2024) SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model. arXiv preprint arXiv:2406.01584. Cheng, A.C., Yin, H., Fu, Y., Guo, Q., Yang, R., Kautz, J., Wang, X. and Liu, S. (UC San Diego, The University of Hong Kong, NVIDIA) (paper) | (code and dataset are coming soon) | (project)

(2024) Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships. arXiv preprint arXiv:2402.12259. CVPR 2024. Koch, S., Vaskevicius, N., Colosi, M., Hermosilla, P., & Ropinski, T. (Bosch Center for Artificial Intelligence, Robert Bosch Corporate Research, University of Ulm, TU Vienna) (paper) | (code is coming soon) | (project)

(2024) EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI. arXiv preprint arXiv:2312.16170. CVPR 2024. Wang, T., Mao, X., Zhu, C., Xu, R., Lyu, R., Li, P., ... & Pang, J. (Shanghai AI Laboratory, Shanghai Jiao Tong University, The University of Hong Kong, The Chinese University of Hong Kong, Tsinghua University) (paper) | (code and dataset) | (project)

(2023) ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning. arXiv preprint arXiv:2309.16650, 2023. Gu, Qiao, et al. (MIT, Universite de Montreal, University of Toronto, IIIT Hyderabad, JHU APL, JHU, UMass Amherst, DEVCOM Army Research Laboratory). (paper) | (code) | (project) | (video)

(2023) VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (School of Software, Beihang University, The University of Hong Kong, East China University of Science and Technology). Wang, Z., Cheng, B., Zhao, L., Xu, D., Tang, Y., & Sheng, L. (paper) | (code) | (model checkpoint)

(2023) Context-aware entity grounding with open-vocabulary 3d scene graphs. arXiv preprint arXiv:2309.15940. CoRL '23. Chang, H., Boyalakuntla, K., Lu, S., Cai, S., Jing, E., Keskar, S., ... & Boularias, A. (Rutgers University-New Brunswick, Drexel University) (paper) | (code) | (project) | (dataset)

(2023) 3d vsg: Long-term semantic scene change prediction through 3d variable scene graphs. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 8179-8186). IEEE. Looper, S., Rodriguez-Puigvert, J., Siegwart, R., Cadena, C., Schmid, L. (ETH Zurich, Universidad de Zaragoza, Massachusetts Institute of Technology) (paper) | (code)

(2022) Language conditioned spatial relation reasoning for 3d object grounding. Advances in neural information processing systems 35, 20522-20535, 2022. S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev (Rutgers University-New Brunswick, Drexel University) (paper) | (code) | (project)

Object-based with hierarchy

(2024) Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation. arXiv preprint arXiv:2403.08605. Honerkamp, D., Buchner, M., Despinoy, F., Welschehold, T., & Valada, A. (University of Freiburg, Toyota Motor Europe (TME)). (paper) | (code) | (project)

(2024) Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation. arXiv preprint arXiv:2403.17846. Werby, A., Huang, C., Büchner, M., Valada, A., & Burgard, W. (University of Freiburg, University of Technology Nuremberg) (paper) | (project)

(2023) Foundations of Spatial Perception for Robotics: Hierarchical Representations and Real-time Systems. arXiv preprint arXiv:2305.07154. Hughes, N., Chang, Y., Hu, S., Talak, R., Abdulhai, R., Strader, J., & Carlone, L. (Massachusetts Institute of Technology) (paper) | (code)

Ontology-based

(2023) Indoor and Outdoor 3D Scene Graph Generation via Language-Enabled Spatial Ontologies. arXiv preprint arXiv:2312.11713. Strader, J., Hughes, N., Chen, W., Speranzon, A., & Carlone, L. (Massachusetts Institute of Technology, University of California, Lockheed Martin) (paper)

(2023) 3d scene graph prediction on point clouds using knowledge graphs. In 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE) (pp. 1-7). IEEE. Qiu, Y., & Christensen, H. I. (University of California San Diego) (paper)

(2021) Knowledge-inspired 3d scene graph prediction in point cloud. Advances in Neural Information Processing Systems, 34, 18620-18632. Zhang, S., Hao, A., & Qin, H. (Beihang University, Peng Cheng Laboratory, Stony Brook University (SUNY)) (paper)

About

Neural network methods for multimodal map reconstruction and their usage for robot navigation and control

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published