Model.reset()
andModel.sample()
signature has changed. They no longer receiveTransitionBatch
objects, and they both return a dictionary of strings to tensors representing a model state that should be passed tosample()
to simulate transitions. This dictionary can contain things like previous actions, predicted observation, latent states, beliefs, and any other such quantity that the model need to maintain to simulate trajectories when usingModelEnv
.Ensemble
class and sub-classes are assumed to operate on 1-D models.- Checkpointing format used by
save()
andload()
in classesGaussianMLP
andOneDTransitionRewardModel
changed, making old checkpoints incompatible with the new version. use_silu
argument toGaussianMLP
has been replaced byactivation_fn_cfg
, which is anomegaconf.DictConfig
specifying the class to use for the activation functions, thus giving more flexibility.- Removed unnecessary nesting inside
dynamics_model
Hydra configuration.
- Added functions to
mbrl.util.models
to easily create convolutional encoder/decoders with a desired configuration. mbrl.util.common.rollout_agent_trajectories
now allows rolling out a pixel-based environment using a policy trained on its corresponding non-pixel environment version.ModelTrainer
can be giveneps
forAdam
optimizer. It now also includes a progress bar usingtqdm
(can be turned off).- CEM optimizer can now be toggled between using clipped normal distribution or truncated normal distribution.
mbrl.util.mujoco.make_env
can now create an environment specified via anomegaconf
configuration andhydra.utils.instantiate
, which takes precedence over the old mechanism if both are present.
- Added MPPI optimizer.
- Added iCEM optimizer.
control_env.py
now works with CEM, iCEM and MPPI.- Changed algorithm configuration so that action optimizer is passed as another config file.
- Added option to quantize pixel obs of gym mujoco and dm control env wrappers.
- Added a sequence iterator,
SequenceTransitionSampler
, that always returns a fixed number of random batches.
- Methods
loss
,eval_score
andupdate
ofModel
class now return a tuple of loss/score and metadata. Currently, supports the old version as well, but this will be deprecated in v0.2.0. ModelTrainer
now accepts a callback that will be called after every batch both during training and evaluation.Normalizer
inutil.math
can now operate using double precision. Utilities now allow specifying if replay buffer and normalizer should use double or float via Hydra config.
- Multiple bug fixes
- Added a training browser to compare results of multiple runs
- Deprecated
ReplayBuffer.get_iterators()
and replaced withmbrl.util.common.get_basic_iterators()
- Added an iterator that returns batches of sequences of transitions of a given length
- Multiple bug fixes
- Added
third_party
folder forpytorch_sac
anddmc2gym
- Library now available in
pypi
- Moved example configurations to package
mbrl.examples
, which can now be run aspython -m mbrl.examples.main
, afterpip
installation
Initial release