parameter batch_size vs max_length vs batcher.size #8600

Marien-r · 2021-07-05T08:00:16Z

Marien-r
Jul 5, 2021

I try to train a classifier using camemBert and got CUDA out of memory problem. To resolve this issue I read that I should decrease the batch size but I'm confused which parameter should I change between :

[nlp] batch_size
[components.transformer] max_batch_items
[corpora.train or dev] max_length
[trainning.batcher] size
[trainning.batcher] buffer

I tried to understand the difference between each parameter :

[nlp] batch_size

Default batch size for pipe and evaluate. Defaults to 1000.

Are those functions used in the training / evaluation process ?
In the quickstart widget (https://spacy.io/usage/training#quickstart), why are the value different according to the hardware ? 1000 for CPU and 128 for GPU.
During the training process, will be the evaluation slower if this value is low ?

[components.transformer] max_batch_items

Maximum size of a padded batch. Defaults to 4096.

According to the warning message : Token indices sequence length is longer than the specified maximum sequence length for this model (556 > 512). Running this sequence through the model will result in indexing errors explained here (#6939), Camembert model has a specified maximum sequence length of 512.

Is the parameter max_batch_item overloaded to this value ? Should I change the value to 512 ?

[corpora.train or dev] max_length

In my comprehension, this value should be equal or lower to the maximum sequence length. In the quickstart widget this value is set to 500 for the training set and 0 for the dev set. If set to 0, will it be overloaded to the maximum sequence length of the transformer model ?

[trainning.batcher] size for spacy.batch_by_padded.v1

The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.

If I don't use compounding, in what this parameter is different of the max_lentgh ?

Here are some parts of my config file

[nlp]
lang = "fr"
pipeline = ["transformer","textcat"]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 256
...

[components.transformer]
factory = "transformer"
# Maximum size of a padded batch. Defaults to 4096.
max_batch_items = 4096
...

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
# Limitations on training document length
max_length = 512
...

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
# The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. compounding.
size = 500
# The number of sequences to accumulate before sorting by length. A larger buffer will result in more even sizing, but if the buffer is very large, the iteration order will be less random, which can result in suboptimal training.
buffer = 128
get_length = null
...

I'm really sorry if thoses questions are basics, I don't have a lot of exeperience in Machine Learning. I'd like also to thank you for the framework and for all your responses to others which helped me more than once.

Answered by adrianeboyd

Jul 6, 2021

For OOM errors, the main settings to adjust are nlp.batch_size and training.batcher.size.

nlp.batch_size affects the default batch size during the evaluation steps (and also the default batch size during future use of the pipeline in general with nlp.pipe). It will be faster if it's higher, but you can run out of memory, usually a lot sooner on GPU. The right setting here always depends on how much memory you have and the document lengths.

training.batcher controls the batch size during the training steps. Here you want to lower size, but for other batchers you should look up the details based on the function:

https://spacy.io/api/top-level#batchers

If it only shows up rarely, you can ign…

View full answer

adrianeboyd · 2021-07-06T09:31:14Z

adrianeboyd
Jul 6, 2021

For OOM errors, the main settings to adjust are nlp.batch_size and training.batcher.size.

nlp.batch_size affects the default batch size during the evaluation steps (and also the default batch size during future use of the pipeline in general with nlp.pipe). It will be faster if it's higher, but you can run out of memory, usually a lot sooner on GPU. The right setting here always depends on how much memory you have and the document lengths.

training.batcher controls the batch size during the training steps. Here you want to lower size, but for other batchers you should look up the details based on the function:

https://spacy.io/api/top-level#batchers

If it only shows up rarely, you can ignore the warning about "Token indices sequence length". The default transformer config uses overlapping strided spans to be able to process documents that are longer than the model max length, and if a span still ends up too long, it is truncated internally before it's passed to the model (if you passed it to the model, it would crash). If you do see this warning a lot, you can consider lowering the window_size. The ratio of spacy tokens to wordpiece tokens can vary between tokenizer models and languages, so the default of 128 might be too high. If you do lower it, it's usually a good idea to lower stride, too, which should stay lower than window_size. See more details:

https://spacy.io/api/transformer#span_getters

The behavior of corpora.train.max_length is a bit unexpected. If the document is longer than max_length and it contains sentence boundaries, the training example is split into docs of individual sentences for training. If there aren't sentences, it's discarded. This is useful for our pretrained pipelines, but not for user tasks where sentences are often missing, so in the future, this default will be 0, which is a better general-purpose default. If your docs are too long you'll potentially run out of memory, but this is a better default than having docs silently skipped, which can lead to lots of confusing errors. The docs are here:

https://spacy.io/api/top-level#corpus-readers

It can make sense for you to adjust nlp.batch_size and training.batcher settings while training, depending on whether you're training on CPU vs. GPU and how much memory you have. I'd also recommend setting corpora.train.max_length = 0 unless you know that the sentence-splitting option is useful for your data (for textcat you wouldn't want this).

I would recommend starting with the rest of settings as defined in the default config and see how the training goes. The GPU defaults are set up to work relatively well for a GPU with 12-16 GB RAM with transformer models like roberta and paragraph-sized documents. If your setup is very different (smaller transformer model like distilbert, much longer docs, etc.), you may need to adjust the batch sizes if the memory usage is too high (to avoid OOM crashes) or too low (to speed things up). Or if you're training on CPU (which isn't really recommended for transformer models), you may be able to increase some of the batch sizes more.

3 replies

Marien-r Jul 20, 2021
Author

Thank you very much for your answer.

About the component.transformer.max_batch_items, when is this parameter used ?
If I understand well, using roberta-base model, every document longer than 512 tokens will be passed into the overlapping strided spans mecanism and if it's still to long it will be truncated. So when does the max_batch_item comes in ?

rs-pawanmethre Apr 12, 2023

For OOM errors, the main settings to adjust are nlp.batch_size and training.batcher.size.

nlp.batch_size affects the default batch size during the evaluation steps (and also the default batch size during future use of the pipeline in general with nlp.pipe). It will be faster if it's higher, but you can run out of memory, usually a lot sooner on GPU. The right setting here always depends on how much memory you have and the document lengths.

training.batcher controls the batch size during the training steps. Here you want to lower size, but for other batchers you should look up the details based on the function:

https://spacy.io/api/top-level#batchers

If it only shows up rarely, you can ignore the warning about "Token indices sequence length". The default transformer config uses overlapping strided spans to be able to process documents that are longer than the model max length, and if a span still ends up too long, it is truncated internally before it's passed to the model (if you passed it to the model, it would crash). If you do see this warning a lot, you can consider lowering the window_size. The ratio of spacy tokens to wordpiece tokens can vary between tokenizer models and languages, so the default of 128 might be too high. If you do lower it, it's usually a good idea to lower stride, too, which should stay lower than window_size. See more details:

https://spacy.io/api/transformer#span_getters

The behavior of corpora.train.max_length is a bit unexpected. If the document is longer than max_length and it contains sentence boundaries, the training example is split into docs of individual sentences for training. If there aren't sentences, it's discarded. This is useful for our pretrained pipelines, but not for user tasks where sentences are often missing, so in the future, this default will be 0, which is a better general-purpose default. If your docs are too long you'll potentially run out of memory, but this is a better default than having docs silently skipped, which can lead to lots of confusing errors. The docs are here:

https://spacy.io/api/top-level#corpus-readers

It can make sense for you to adjust nlp.batch_size and training.batcher settings while training, depending on whether you're training on CPU vs. GPU and how much memory you have. I'd also recommend setting corpora.train.max_length = 0 unless you know that the sentence-splitting option is useful for your data (for textcat you wouldn't want this).

I would recommend starting with the rest of settings as defined in the default config and see how the training goes. The GPU defaults are set up to work relatively well for a GPU with 12-16 GB RAM with transformer models like roberta and paragraph-sized documents. If your setup is very different (smaller transformer model like distilbert, much longer docs, etc.), you may need to adjust the batch sizes if the memory usage is too high (to avoid OOM crashes) or too low (to speed things up). Or if you're training on CPU (which isn't really recommended for transformer models), you may be able to increase some of the batch sizes more.

how to improve the accuracy of the model ? also accuracy remains same while resuming training on new sets of data, why ?

Arjun2905 May 2, 2024

Even I'm facing the same issue @rs-pawanmethre, if someone provides any solution to this please let me know. Accuracy remains same over several next epochs after the first epoch run and do not show any improvement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parameter batch_size vs max_length vs batcher.size #8600

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

parameter batch_size vs max_length vs batcher.size #8600

Marien-r Jul 5, 2021

[nlp] batch_size

[components.transformer] max_batch_items

[corpora.train or dev] max_length

[trainning.batcher] size for spacy.batch_by_padded.v1

Replies: 1 comment · 3 replies

adrianeboyd Jul 6, 2021

Marien-r Jul 20, 2021 Author

rs-pawanmethre Apr 12, 2023

Arjun2905 May 2, 2024

Marien-r
Jul 5, 2021

Replies: 1 comment 3 replies

adrianeboyd
Jul 6, 2021

Marien-r Jul 20, 2021
Author