-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when batch_size different from 1 in NeuralNetworkRegressor #225
Comments
Thanks @MathNog for reporting. I've not tried to reproduce, but your analysis sounds reasonable. (Current tests do include changing batch size for some non-recurrent networks.) Each time It's a while since I looked at RNNs, but I would have thought calling |
Thanks for the comment, @ablaom, and I believe you are correct in your suggestion. I have altered the both MLJFlux.fit! and MLJFlux.train! inside the scope of my own project adding que Flux.reset! command excatly as you have said. However, in order to add that line I also had to change the code structure a little, while making sure the final result is the same.
I have also noticed that, in order to everything run smoothly, the function MLJModelInterface.predict, in src/regressor.jl should also be modified by adding the reset! command, and I have made it work as follows.
With all those changes, I could train and predict a NeuralNetworkRegressor with batchsize different from 1 with no issues. |
Thanks for that, but I think I was not clear enough. My understanding is that a Flux RNN must be trained on batches that are all the same size. Calling I'm not an expert on RNN's, so I may have this wrong. Perhaps @ToucheSir can comment. If I'm right, then the more appropriate remedy is to ensure all batches have the same size, when the batch size does not divide the number of observations, so that the last batch is smaller than the others. For example, we could simply ignore the last batch. To justify this, we would need to ensure we are also shuffling observations between epochs, which is not implemented, if I remember correctly. |
With the caveat that I have not read through the entire thread, it's perfectly fine to have different batch sizes while training an RNN. |
When I pass batch_size as a parameter to the NeuralNetworkRegressor() the model can´t be fitted because of a dimension mismatch.
I have written the following code:
And the error messagem when training it is:
I suspect that this error is caused by the fact that there is no Flux.reset!() after each batch update inside the training loop.
The text was updated successfully, but these errors were encountered: