parameter batch_size vs max_length vs batcher.size #8600
-
I try to train a classifier using camemBert and got CUDA out of memory problem. To resolve this issue I read that I should decrease the batch size but I'm confused which parameter should I change between :
I tried to understand the difference between each parameter : [nlp] batch_sizeDefault batch size for pipe and evaluate. Defaults to 1000. Are those functions used in the training / evaluation process ? [components.transformer] max_batch_itemsMaximum size of a padded batch. Defaults to 4096. According to the warning message : Token indices sequence length is longer than the specified maximum sequence length for this model (556 > 512). Running this sequence through the model will result in indexing errors explained here (#6939), Camembert model has a specified maximum sequence length of 512. Is the parameter max_batch_item overloaded to this value ? Should I change the value to 512 ? [corpora.train or dev] max_lengthIn my comprehension, this value should be equal or lower to the maximum sequence length. In the quickstart widget this value is set to 500 for the training set and 0 for the dev set. If set to 0, will it be overloaded to the maximum sequence length of the transformer model ? [trainning.batcher] size for spacy.batch_by_padded.v1The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding. If I don't use compounding, in what this parameter is different of the max_lentgh ? Here are some parts of my config file
I'm really sorry if thoses questions are basics, I don't have a lot of exeperience in Machine Learning. I'd like also to thank you for the framework and for all your responses to others which helped me more than once. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
For OOM errors, the main settings to adjust are
https://spacy.io/api/top-level#batchers If it only shows up rarely, you can ignore the warning about "Token indices sequence length". The default transformer config uses overlapping strided spans to be able to process documents that are longer than the model max length, and if a span still ends up too long, it is truncated internally before it's passed to the model (if you passed it to the model, it would crash). If you do see this warning a lot, you can consider lowering the https://spacy.io/api/transformer#span_getters The behavior of https://spacy.io/api/top-level#corpus-readers It can make sense for you to adjust I would recommend starting with the rest of settings as defined in the default config and see how the training goes. The GPU defaults are set up to work relatively well for a GPU with 12-16 GB RAM with transformer models like |
Beta Was this translation helpful? Give feedback.
For OOM errors, the main settings to adjust are
nlp.batch_size
andtraining.batcher.size
.nlp.batch_size
affects the default batch size during the evaluation steps (and also the default batch size during future use of the pipeline in general withnlp.pipe
). It will be faster if it's higher, but you can run out of memory, usually a lot sooner on GPU. The right setting here always depends on how much memory you have and the document lengths.training.batcher
controls the batch size during the training steps. Here you want to lowersize
, but for other batchers you should look up the details based on the function:https://spacy.io/api/top-level#batchers
If it only shows up rarely, you can ign…