Skip to content

Code for AAAI 2021 paper "Learning Intuitive Physics with Multimodal Generative Models"

License

Notifications You must be signed in to change notification settings

SAIC-MONTREAL/multimodal-dynamics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Intuitive Physics with Multimodal Generative Models

This package provides the PyTorch implementation and the vision-based tactile sensor simulator for our AAAI 2021 paper. The tactile simulator is based on PyBullet and provides the simulation of the Semi-transparent Tactile Sensor (STS).

Installation

The recommended way is to install the package and all its dependencies in a virtual environment using:

git clone https://github.com/SAIC-MONTREAL/multimodal-dynamics.git
cd multimodal-dynamics
pip install -e .

Visuotactile Simulation

The sub-package tact_sim provides the components required for visoutactile simulation of the STS sensor and is implemented in PyBullet. The simulation is vision based and is not meant to be physically accurate of the contacts and soft body dynamics.

To run an example script of an object falling on the sensor use:

python tact_sim/examples/demo.py --show_image --object winebottle

This loads the object from the graphics/objects and renders the resulting visual and tactile images.

The example scripts following the name format experiments/exp_{ID}_{task}.py have been used to generate the dataset of our AAAI 2021 paper. In order to run them, you need to have ShapeNetSem dataset installed on your machine.

Preparing ShapeNetSem

Follow the steps below to download and prepare the ShapeNetSem dataset:

  1. Register and get access to ShapeNetSem.
  2. Only the OBJ and texture files are needed. Download models-OBJ.zip and models-textures.zip.
  3. Download metadata.csv and categories.synset.csv.
  4. Unzip the compressed files and move the contents of models-textures.zip to models-OBJ/models:
.
└── ShapeNetSem
    ├── categories.synset.csv
    ├── metadata.csv
    └── models-OBJ
        └── models

Data Collection

To run the data collection scripts use:

python experiments/exp_{ID}_{task}.py --logdir {path_to_logdir} --dataset_dir {path_to_ShapeNetSem} --category "WineBottle, Camera" --show_image

To see all available object classes that are suitable for these experiments see tact_sim/config.py.

Learning Multimodal Dynamics Models

Once you have collected the dataset, you can start training the multimodal ''resting state predictor'' dynamics model, as described in the paper, using:

python main.py --dataset-path {absolute_path_dataset} --problem-type seq_modeling --input-type visuotactile --model-name cnn-mvae --use-pose

This trains the MVAE model that fuses visual, tactile and pose modalilities into a shared latent space.

To train the resting state predictor for a single modality (e.g., tactile or visual only), use:

python main.py --dataset-path {absolute_path_dataset} --problem-type seq_modeling --input-type visual --model-name cnn-vae

To train a standard one-step dynamics model, use dyn_modeling as the input argument problem_type.

Experiments

Check this video for a demo of the experiments:

IMAGE ALT TEXT HERE

License

Creative Commons License
This work by SECA is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Citing

If you use this code in your research, please cite:

@article{rezaei2021learning,
  title={Learning Intuitive Physics with Multimodal Generative Models},
  author={Rezaei-Shoshtari, Sahand and Hogan, Francois Robert and Jenkin, Michael and Meger, David and Dudek, Gregory},
  journal={arXiv preprint arXiv:2101.04454},
  year={2021}
}
@inproceedings{hogan2021seeing,
  title={Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor},
  author={Hogan, Francois R and Jenkin, Michael and Rezaei-Shoshtari, Sahand and Girdhar, Yogesh and Meger, David and Dudek, Gregory},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={1218--1227},
  year={2021}
}

Releases

No releases published

Packages

No packages published

Languages