Release 21.09 (September 30th 2021)
Frontend Supports
- Changing PyBind11 as the default Python frontend in FlexFlow.
Control Replication
- FlexFlow now enables Legion's dynamic control replication by default
Distributed training
- FlexFlow now uses NCCL AllReduce for gradients synchronization by default. To switch to distributed parameter server, set
FF_USE_NCCL=OFF
in cmake.
Distributed inference
- Passing
comp_node = comp_node = CompMode::INFERENCE
as an additional argument tomodel.compile
will run a DNN model in the inference model - Various bug fixes and performance improvements for distributed inference in FlexFlow.
Operators
- Additional operators include AggregateSpec, Multi-Head Attention
Machine Model
- FlexFlow now support a new machine model for more precisely modeling network topology and simulating traffics at the granularity of individual packages