Paper: MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control
This is the codebase for the MoCapAct project, which uses motion capture (MoCap) clips to learn low-level motor skills for the "CMU Humanoid" from the dm_control package. This repo contains all code to:
- train the clip snippet experts,
- collect expert rollouts into a dataset,
- download our experts and rollouts from the command line,
- perform policy distillation,
- perform reinforcement learning on downstream tasks, and
- perform motion completion.
For more information on the project and to download the entire dataset, please visit the project website.
For users interested in development, we recommend reading the following documentation on dm_control:
MoCapAct requires Python 3.7+. We recommend that you use a virtual environment. For example, using conda:
conda create -n MoCapAct pip python==3.8
conda activate MoCapAct
To install the package, we recommend cloning the repo and installing the local copy:
git clone https://github.com/microsoft/MoCapAct.git
cd MoCapAct
pip install -e .
The MoCapAct dataset consists of clip experts trained on the MoCap snippets and the rollouts from those experts.
We provide the dataset and models on the MoCapAct collection on Hugging Face. This collection consists of two pages:
- A model zoo which contains the clip snippet experts, multiclip policies, RL-trained policies for the transfer tasks, and the GPT policy.
- A dataset page which contains the small rollout dataset and large rollout dataset.
Clip snippet experts
We signify a clip snippet expert by the snippet it is tracking. Taking CMU_009_12-165-363 as an example expert, the file hierarchy for the snippet expert is:CMU_009_12-165-363
βββ clip_info.json # Contains clip ID, start step, and end step
βββ eval_rsi/model
βββ best_model.zip # Contains policy parameters and hyperparameters
βββ vecnormalize.pkl # Used to get normalizer for observation and reward
The expert policy can be loaded using our repository:
from mocapact import observables
from mocapact.sb3 import utils
expert_path = "data/experts/CMU_009_12-165-363/eval_rsi/model"
expert = utils.load_policy(expert_path, observables.TIME_INDEX_OBSERVABLES)
from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import types
dataset = types.ClipCollection(ids=['CMU_009_12'], start_steps=[165], end_steps=[363])
env = tracking.MocapTrackingGymEnv(dataset)
obs, done = env.reset(), False
while not done:
action, _ = expert.predict(obs, deterministic=True)
obs, rew, done, _ = env.step(action)
print(rew)
Expert rollouts
The expert rollouts consist of a collection of HDF5 files, one per clip. An HDF5 file contains expert rollouts for each constituent snippet as well as miscellaneous information and statistics. To facilitate efficient loading of the observations, we concatenate all the proprioceptive observations (joint angles, joint velocities, actuator activations, etc.) from an episode into a single numerical array and provide indices for the constituent observations in the observable_indices group.Taking CMU_009_12.hdf5 (which contains three snippets) as an example, we have the following HDF5 hierarchy:
CMU_009_12.hdf5
βββ n_rsi_rollouts # R, number of rollouts from random time steps in snippet
βββ n_start_rollouts # S, number of rollouts from start of snippet
βββ ref_steps # Indices of MoCap reference relative to current time step. Here, (1, 2, 3, 4, 5).
βββ observable_indices
β βββ walker
β βββ actuator_activation # (0, 1, ..., 54, 55)
β βββ appendages_pos # (56, 57, ..., 69, 70)
β βββ body_height # (71)
β βββ ...
β βββ world_zaxis # (2865, 2866, 2867)
β
βββ stats # Statistics computed over the entire dataset
β βββ act_mean # Mean of the experts' sampled actions
β βββ act_var # Variance of the experts' sampled actions
β βββ mean_act_mean # Mean of the experts' mean actions
β βββ mean_act_var # Variance of the experts' mean actions
β βββ proprio_mean # Mean of the proprioceptive observations
β βββ proprio_var # Variance of the proprioceptive observations
β βββ count # Number of observations in dataset
β
βββ CMU_009_12-0-198 # Rollouts for the snippet CMU_009_12-0-198
βββ CMU_009_12-165-363 # Rollouts for the snippet CMU_009_12-165-363
βββ CMU_009_12-330-529 # Rollouts for the snippet CMU_009_12-330-529
Each snippet group contains
CMU_009_12-165-363
βββ early_termination # (R+S)-boolean array indicating which episodes terminated early
βββ rsi_metrics # Metrics for episodes that initialize at random points in snippet
β βββ episode_returns # R-array of episode returns
β βββ episode_lengths # R-array of episode lengths
β βββ norm_episode_returns # R-array of normalized episode rewards
β βββ norm_episode_lengths # R-array of normalized episode lengths
βββ start_metrics # Metrics for episodes that initialize at start in snippet
β
βββ 0 # First episode, of length T
β βββ observations
β β βββ proprioceptive # (T+1)-array of proprioceptive observations
β β βββ walker/body_camera # (T+1)-array of images from body camera **(not included)**
β β βββ walker/egocentric_camera # (T+1)-array of images from egocentric camera **(not included)**
β βββ actions # T-array of sampled actions executed in environment
β βββ mean_actions # T-array of corresponding mean actions
β βββ rewards # T-array of rewards from environment
β βββ values # T-array computed using the policy's value network
β βββ advantages # T-array computed using generalized advantage estimation
β
βββ 1 # Second episode
βββ ...
βββ R+S-1 # (R+S)th episode
To keep the dataset size manageable, we do not include image observations in the dataset.
The camera images can be logged by providing the flags --log_all_proprios --log_cameras
to the mocapact/distillation/rollout_experts.py
script.
The HDF5 rollouts can be read and utilized in Python:
import h5py
dset = h5py.File("data/small_dataset/CMU_009_12.hdf5", "r")
print("Expert actions from first rollout episode of second snippet:")
print(dset["CMU_009_12-165-363/0/actions"][...])
We provide a "large" dataset where
Below are Python commands we used for our paper.
Clip snippet experts
Training a clip snippet expert:
python -m mocapact.clip_expert.train \
--clip_id [CLIP_ID] `# e.g., CMU_016_22` \
--start_step [START_STEP] `# e.g., 0` \
--max_steps [MAX_STEPS] `# e.g., 210 (can be larger than clip length)` \
--n_workers [N_CPU] `# e.g., 8` \
--log_root experts \
$(cat cfg/clip_expert/train.txt)
Evaluating a clip snippet expert (numerical evaluation and visual evaluation):
python -m mocapact.clip_expert.evaluate \
--policy_root [POLICY_ROOT] `# e.g., experts/CMU_016-22-0-82/0/eval_rsi/model` \
--n_workers [N_CPU] `# e.g., 8` \
--n_eval_episodes 1000 `# set to 0 to just run the visualizer` \
$(cat cfg/clip_expert/evaluate.txt)
We can also load the experts in Python:
from mocapact import observables
from mocapact.sb3 import utils
expert_path = "experts/CMU_016_22-0-82/0/eval_rsi/model" # example path
expert = utils.load_policy(expert_path, observables.TIME_INDEX_OBSERVABLES)
from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import types
dataset = types.ClipCollection(ids=['CMU_016_22'])
env = tracking.MocapTrackingGymEnv(dataset)
obs, done = env.reset(), False
while not done:
action, _ = expert.predict(obs, deterministic=True)
obs, rew, done, _ = env.step(action)
print(rew)
Creating rollout dataset
Rolling out a collection of experts and collecting into a dataset:
python -m mocapact.distillation.rollout_experts \
--input_dirs [EXPERT_ROOT] `# e.g., experts` \
--n_workers [N_CPU] `# e.g., 8` \
--device [DEVICE] `# e.g., cuda` \
--output_path dataset/file_name_ignored.hdf5 \
$(cat cfg/rollout.txt)
This will result in a collection of HDF5 files (one per clip), which can be read and utilized in Python:
import h5py
dset = h5py.File("dataset/CMU_016_22.hdf5", "r")
print("Expert actions from first rollout episode:")
print(dset["CMU_016_22-0-82/0/actions"][...])
Multi-clip policy
Training a multi-clip policy on the entire MoCapAct dataset:
source scripts/get_all_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.train \
--train_dataset_paths $train \
--val_dataset_paths $val \
--dataset_metrics_path $metrics \
--extra_clips $clips \
--output_root multi_clip/all \
--gpus 0 `# indices of GPUs` \
$(cat cfg/multi_clip/train.txt) \
$(cat cfg/multi_clip/rwr.txt) `# rwr can be replaced with awr, cwr, or bc` \
--model.config.embed_size 60 \
--eval.n_workers [N_CPU] `# e.g., 16`
Training a multi-clip policy on the locomotion subset of the MoCapAct dataset:
source scripts/get_locomotion_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.train \
--train_dataset_paths $train \
--dataset_metrics_path $metrics \
--extra_clips $clips \
--output_root multi_clip/locomotion \
--gpus 0 `# indices of GPUs` \
$(cat cfg/multi_clip/train.txt) \
$(cat cfg/multi_clip/rwr.txt) `# rwr can be replaced with awr, cwr, or bc` \
--model.config.embed_size 20 \
--eval.n_workers [N_CPU] `# e.g., 16`
Evaluating a multi-clip policy on all the snippets within the MoCapAct dataset (numerical evaluation and visual evaluation):
source scripts/get_all_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.evaluate \
--policy_path [POLICY_PATH] `# e.g., multi_clip/all/eval/train_rsi/best_model.ckpt` \
--clip_snippets $snippets \
--n_workers [N_CPU] `# e.g., 8` \
--device [DEVICE] `# e.g., cuda` \
--n_eval_episodes 1000 `# set to 0 to just run the visualizer` \
$(cat cfg/multi_clip/evaluate.txt)
The multi-clip policy can be loaded using PyTorch Lightning's functionality to interact with the environment:
from mocapact.distillation import model
model_path = "multi_clip/all/eval/train_rsi/best_model.ckpt"
policy = model.NpmpPolicy.load_from_checkpoint(model_path, map_location="cpu")
from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import cmu_subsets
dataset = cmu_subsets.ALL
ref_steps = (1, 2, 3, 4, 5)
env = tracking.MocapTrackingGymEnv(dataset, ref_steps)
obs, done = env.reset(), False
embed = policy.initial_state(deterministic=False)
while not done:
action, embed = expert.predict(obs, state=embed, deterministic=False)
obs, rew, done, _ = env.step(action)
print(rew)
RL transfer tasks
Training a task policy (here, with a pre-defined low-level policy):
python -m mocapact.transfer.train \
--log_root [LOG_ROOT] `# e.g., transfer/go_to_target` \
$(cat cfg/transfer/train.txt) \
$(cat cfg/transfer/go_to_target.txt) `# set to cfg/transfer/velocity_control.txt for velocity control` \
$(cat cfg/transfer/with_low_level.txt) `# set to cfg/transfer/no_low_level.txt for no low-level policy`
Evaluating a task policy:
python -m mocapact.transfer.evaluate \
--model_root [MODEL_ROOT] `# e.g., transfer/go_to_target/0/eval/model` \
--task [TASK] `# e.g., mocapact/transfer/config.py:go_to_target or velocity_control`
Motion completion
Training a GPT policy on the entire MoCapAct dataset:
source scripts/get_all_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.train \
--train_dataset_paths $train \
--val_dataset_paths $val \
--dataset_metrics_path $metrics \
--output_root motion_completion \
$(cat cfg/motion_completion/train.txt)
Performing motion completion with a trained GPT policy:
python -m mocapact.distillation.motion_completion \
--policy_path [POLICY_PATH] `# e.g., motion_completion/model/last.ckpt` \
--expert_root [EXPERT_ROOT] `# e.g., experts` \
--clip_snippet [CLIP_SNIPPET] `# e.g., CMU_016_22` \
--n_workers [N_CPU] `# e.g., 8` \
--device [DEVICE] `# e.g., cuda` \
--n_eval_episodes 100 `# Set to 0 to just run the visualizer` \
$(cat cfg/motion_completion/evaluate.txt)
To generate a prompt, we also input a path to the directory of snippet experts.
Alternatively, you can pass a path to a multi-clip policy through --distillation_path
, though it will likely produce lower-quality prompts than the snippet experts.
We provide two datasets in this repo.
The ExpertDataset
is used to perform imitation learning, e.g., to train a multi-clip tracking policy or a GPT policy for motion completion.
The D4RLDataset
is used for offline reinforcement learning.
For small enough instantiations of the datasets that fit into memory, the user can use D4RLDataset.get_in_memory_rollouts
to load a batch of transitions into memory.
For instantiations that do not fit into memory (e.g., the entire MoCapAct dataset), the user can use the dataset as a PyTorch Dataset
by using an iterator over the transitions obtained by using __getitem__
.
- H-GAP: Humanoid Control with a Generalist Planner, by Zhengyao Jiang, Yingchen Xu, et al. (Paper, Website, Code)
- Leveraging Demonstrations with Latent Space Priors, by Jonas Gehring et al. (Paper, Website, Code)
- Hierarchical World Models as Visual Whole-Body Humanoid Controllers, by Nicklas Hansen et al. (Paper, Website, Code)
- Body Transformer: Leveraging Robot Embodiment for Policy Learning, by Carmelo Sferrazza et al. (Paper, Website, Code)
If you reference or use MoCapAct in your research, please cite:
@inproceedings{wagener2022mocapact,
title={{MoCapAct}: A Multi-Task Dataset for Simulated Humanoid Control},
author={Wagener, Nolan and Kolobov, Andrey and Frujeri, Felipe Vieira and Loynd, Ricky and Cheng, Ching-An and Hausknecht, Matthew},
booktitle={Advances in Neural Information Processing Systems},
volume={35},
pages={35418--35431},
year={2022}
}
The code is licensed under the MIT License. The dataset is licensed under the CDLA Permissive v2 License.