Output each char individually, avoid cost of appending to static string #138
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This loop in
LanguageModel.lua
is a killer for performance with strings greater than about 100,000 characters. Like a lot of languages strings in Lua are static so appending to them requires reallocating a new chunk of memory, copying all of the previous string into the new memory and adding the new character to the end. Of course they try and mitigate it by doubling the size of the string array each reallocation, but while this performs well in theory, in practice it still kills run time. Here's some perf numbers for various size strings running the original code (model_type: lstm, rnn_size: 256, layers: 3, temp: 0.9):
For some reason it runs out of memory at 1,000,000 characters so the 15.6x slowdown compared to 100,000 characters is probably smaller than if it actually finished.
Here's the perf numbers for this pull request which outputs each character as soon as it is generated, avoiding repeated appends to a large string:
With smaller strings the start-up time dominates and there's not much difference. At around 100,000 characters the new code just starts to show real improvements. At 800,000 characters the old code is 11.3x as slow as generating 100,000 characters and it only gets worse as numbers get bigger. The new code is 7.7x as slow, less than the expected 8x slowdown.
There's ways to keep the string in memory and only output it at the end effeciently. But outputting the characters immediately is helpful when first learning
torch-rnn
/torch
as well as when tuning thetemperature
parameter on a newly trained RNN.