Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'POST' data size, is there a limit? #11

Open
emanokaro opened this issue Oct 4, 2021 · 2 comments
Open

'POST' data size, is there a limit? #11

emanokaro opened this issue Oct 4, 2021 · 2 comments

Comments

@emanokaro
Copy link

How's it possible to increase input data size? would it be by batch size?

@emanokaro
Copy link
Author

@daniel-ziegler Thanks for your great work. I noticed your work on summarising books and the hierachical approach.
My question is what's the max input limit for the model so that is still able to apply attention to very first words/sentenses? And How I change that limit? tried already with batch_size and od_model in transformer.py. Thanks.

@UntotaufUrlaub
Copy link

Hi @emanokaro,
I don't know for sure but my educated guess is that you can't change the input size. Most standard transformer architectures only allow one fixed input size without any retraining. The reasons that each word is associated with a learned position embedding. So if you want to add words the position embedding is lacking. In principle this could be avoided but this is not the standard at the moment as far as I know.
Hints from the paper are:
"All models follow the standard Transformer architecture, with 2048 learned position embeddings." (page 17)
"The batch size ramped up throughout training to some maximum, with each input having 2048 tokens." (page 17)
"Our model always receives a byte-pair encoded string of a fixed size. When the input is too small, we
pad from the beginning of the input with a padding token, and if the input is too long we truncate the
post/article field at newlines to stay under the limit." (page 18)

(Batch size doesn't sound promising by the way. It means how many text are processed in parallel in one training forward pass. This is only connected to the input size - length of each individual text - as the memory requirement scales with input and batch size. So the need for bigger batch size in the training is a constraint for the input size of the model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants