Added support for using multiple GPUs on training server when training the model #568
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added code to train the model on more than one GPU
When using a single GPU the code is basically unchanged (tf.Variable() is replaced with tf.get_variable() to be able to reuse the variables, and model loading is changed so it can load models previously trained with multiple GPUs)
In my own tests a model training with same batch size runs in around 65% of the time with 2 GPUs compared to 1 GPU... doubling the batch size makes training take about 30% longer than a single GPU with unchanged batch size
Added configurable param for which device to collect and apply the gradients from all the GPUs, in my tests it was running about 10% faster when doing this on the CPU compared to one of the GPUs, but this may be different on other systems depending on the interconnects between the GPUs