You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Probably shuffling without breaking batches is fine. I suggest adding an rng hyperparameter as well, which is interpreted as MersenneTwister(rng) if rng is an integer (as elsewhere in MLJ), and falls back to Random.GLOBAL_RNG This could ultimately be passed to the chain intitializers, although Flux does not currently make this easy (FluxML/Flux.jl#1335).
I also suggest the following handling for RNG here (and more generally) to help with reproducibility: The fit method creates a deep copy of the RNG, which then gets mutated as various rand(rng, ...) calls are made. The final state is then output to cache so that update can carry on with the mutated RNG in the case of a warm restart. In this way,
(i) multiple warm-restarts behave the same as training all in one go (modulo the chain initialisation problem), even if the original RNG gets used somewhere else in between restarts; and
(ii) By specifying a concrete RNG at model construction time, cold-restarts (with, eg, fit!(mach, force=true)) give the same behaviour every time.
Probably shuffling without breaking batches is fine. I suggest adding an
rng
hyperparameter as well, which is interpreted asMersenneTwister(rng)
ifrng
is an integer (as elsewhere in MLJ), and falls back toRandom.GLOBAL_RNG
This could ultimately be passed to the chain intitializers, although Flux does not currently make this easy (FluxML/Flux.jl#1335).I also suggest the following handling for RNG here (and more generally) to help with reproducibility: The
fit
method creates a deep copy of the RNG, which then gets mutated as variousrand(rng, ...)
calls are made. The final state is then output tocache
so thatupdate
can carry on with the mutated RNG in the case of a warm restart. In this way,(i) multiple warm-restarts behave the same as training all in one go (modulo the chain initialisation problem), even if the original RNG gets used somewhere else in between restarts; and
(ii) By specifying a concrete
RNG
at model construction time, cold-restarts (with, eg,fit!(mach, force=true)
) give the same behaviour every time.@ayush-1506
The text was updated successfully, but these errors were encountered: