Parallelize Pretraining #8842
Replies: 3 comments
-
I'd love to have this, but it's non-trivial. I'm not even sure what the best algorithm would be -- I guess asynchronous SGD? |
Beta Was this translation helpful? Give feedback.
-
I was reading, about asynchronous SGD the thing it's the problem whit delayed gradients, they have been proposed several alternatives, like: https://arxiv.org/abs/1609.08326 or https://arxiv.org/abs/1601.04033, http://aclweb.org/anthology/D18-1332 among others.. Each one whit specific trade-offs. Definitly I also believe independent of the method that better suits to thinc's architecture, whould be great! to have parallelization, maybe also on other routines like training NER, Textcat etc. That could be very helpful for scaling up experiments on clusters, cloud providers, etc. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the links! I think the literature that's probably most relevant are the papers on really massive multi-GPU training. For multi-CPU training we'll have a fair amount of latency from inter-process communication, so techniques designed to work despite network latency would be good. This is a new library that has interesting results: https://ai.googleblog.com/2019/03/introducing-gpipe-open-source-library.html . Another thing to look at is what Chainer is doing. They've been publishing pretty good benchmarks for large-scale training. |
Beta Was this translation helpful? Give feedback.
-
Hello, great feature!, currently I'm doing some experiments on my specific use cases.
But I notice that pretraining speed It's considerably slow, 1 epoch took almost two days in 1B corpus at an average of 4800 w/s.
So I check the uses of resources of the task, I'm training whit a configuration of dual 12 cores Xeon CPUs, (total 24 CPUs) a single machine, without GPU, and I noticed It's only using 1 core at a time.
Will be possible to add the desired number of workers on this task, then we could use the maximum number of cores and parallelize the pretraining, it could accelerate the processing time?
Best regards
Beta Was this translation helpful? Give feedback.
All reactions