Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of how to use with Optimisers.jl #34

Closed
theabhirath opened this issue May 31, 2022 · 13 comments · Fixed by #55
Closed

Example of how to use with Optimisers.jl #34

theabhirath opened this issue May 31, 2022 · 13 comments · Fixed by #55

Comments

@theabhirath
Copy link
Member

Hi, I've been trying to use this package with Optimisers.jl (specifically, I've been trying to use a Step schedule with a Scheduler but I seem to be getting errors that suggest that this setup works with the Flux optimisers, and not with Optimisers.jl for now. Is there a way to write code that works with Optimisers.jl?

@darsnack
Copy link
Member

Constructing optimizers from Optimisers.jl is cheap and simple, since the state is de-coupled. Something like this would work:

struct Scheduler{T, F}
    constructor::F
    schedule::T
end

_get_opt(scheduler::Scheduler, t) = scheduler.constructor(scheduler.schedule(t))

Optimisers.init(o::Scheduler, x::AbstractArray) =
    (t = 1, opt = Optimisers.init(_get_opt(o, 1), x))

function Optimisers.apply!(o::Scheduler, state, x, dx)
    opt = _get_opt(o, state.t)
    new_state, new_dx = Optimisers.apply!(opt, state.opt, x, dx)

    return (t = state.t + 1, opt = new_state), new_dx
end

opt = Scheduler(Step(init_lr, decay)) do lr
    Momentum(lr)
end
st = Optimisers.setup(opt, model)

@theabhirath
Copy link
Member Author

I tried this and my model stopped training 😬 It's stuck after one epoch

@darsnack
Copy link
Member

darsnack commented May 31, 2022

That's weird. Which rule are you using? And can you post the value of st after setup for a small model? I would also print opt inside the definition for apply!.

@darsnack
Copy link
Member

I would also confirm that training doesn't stall with a reasonable fixed LR first.

@theabhirath
Copy link
Member Author

theabhirath commented Jun 1, 2022

Whoops, nevermind, figured it out. I set step_sizes = 25 for Step, but naturally since Step is being called every step and not every epoch step_sizes has to be (25 * dataset_size) / batch_size. Reducing it by 0.1 every step meant that the learning rate was on the order of 10^-40 before even starting the second epoch 😅 It's training now, thank you!

@darsnack
Copy link
Member

darsnack commented Jun 1, 2022

Ah good to know. You should check out Interpolator and the docs for it (under complex schedules). Just a slightly less error prone/cleaner way of specifying schedules in epochs that will be iterated by mini-batch.

@ToucheSir
Copy link
Member

Now that we have Optimisers.adjust!, should Scheduler be modernized and adopted as the recommended (easy) way to schedule parameters with Flux + Optimisers.jl?

@Qiyu-Zh
Copy link

Qiyu-Zh commented Jan 2, 2024

I meet a problems that scheduler can not be setup. Could you share me more details?

@darsnack
Copy link
Member

darsnack commented Jan 2, 2024

The details for calling setup are in the original code above. Can you share the error?

@Qiyu-Zh
Copy link

Qiyu-Zh commented Jan 2, 2024

image

@darsnack
Copy link
Member

darsnack commented Jan 2, 2024

Modify the original code as

struct Scheduler{T, F} <: Optimisers.AbstractRule
    constructor::F
    schedule::T
end

@Qiyu-Zh
Copy link

Qiyu-Zh commented Jan 2, 2024

Great, thanks!

@Qiyu-Zh
Copy link

Qiyu-Zh commented Feb 1, 2024

Why there is the question the EXP not compatible with float?
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants