RFC: add Functors-aware structural gradient #129

mcabbott · 2022-08-21T16:29:04Z

It's surprisingly simple to add a function which does this:

julia> model = Chain(Dense(2 => 1, tanh), Dense(1 => 1, bias=false));

julia> withgradient(model, rand(Float32, 2)) do m, x
         sum(abs2, m(x))
       end
(val = 0.035716165f0, grad = ((layers = ((weight = Float32[-0.4241869 -0.16741231], bias = Float32[-0.5529184], σ = nothing), (weight = Float32[-0.04804218;;], bias = nothing, σ = nothing)),), Float32[0.12706584, -0.08858479]))

At the moment this is done without touching existing functions, although perhaps gradient could be upgraded to do this without breaking anything.

The reason to do so is that this would make Tracker.jl usable again with Flux.jl via Optimisers.jl, and perhaps as a drop-in option for something like FluxML/Flux.jl#2029 .

The reason not to do this is that Tracker.jl is old and perhaps spreading maintenance effort more thinly is undesirable.

ToucheSir

I'm pleasantly surprised by how few LOC are needed for this. To make the Optimisers dependency go down better, perhaps it should be thrown under a @require? It's unlikely Tracker will drop Requires.jl any time soon, so we might as well make use of that.

ToucheSir · 2022-08-21T16:56:15Z

src/back.jl

+```
+"""
+function withgradient(f, xs...)
+    pxs = fmap(param, xs; exclude = isnumeric)  # would ideally apply params only to trainable


Some variation of trainable_walk from FluxML/Optimisers.jl#35 (comment) could work here.

Thanks, I get lost every time I try to remember how walks work... that looks like ~~the right one.~~ one place I thought I understood...

I guess another option would be not to depend on Optimisers at all, just Functors. Although not tracking non-trainable arrays in Flux probably increases the chances of this just working.

Am not keen on Requires here, seems like a hassle for one tiny package. This already depends on 19 others: https://juliahub.com/ui/Packages/Tracker/cI3wW/0.2.20?page=1

We still need a way to not track non-trainable arrays while still tracking arrays that should be moved on/off GPU in Flux.

I suppose the dep issue isn't a major one in practice, but it may turn a couple of eyebrows. If import times stay mostly the same though, no objections here.

Depending on trainable does mean raising the floor to Julia 1.6. I think that's fine, anyone on 1.3 has surely accepted freezing all downstream packages by now, we aren't planning to backport bugfixes.

Zygote of course tries to compute gradients with non-trainable too. But would be nice not to do so.

mcabbott · 2022-08-21T20:22:39Z

Project.toml

 DiffRules = "1.4"
+Functors = "0.3.0"


BTW [email protected] required Metalhead#master right now. With that, the example from https://fluxml.ai/Optimisers.jl/dev/#Usage-with-[Flux.jl](https://github.com/FluxML/Flux.jl) runs, and has half the TTFG of Zygote:

julia> let Random.seed!(1) model = Metalhead.ResNet(18) |> gpu # define a model to train image = rand(Float32, 224, 224, 3, 1) |> gpu; # dummy data @show sum(model(image)); # dummy loss function rule = Optimisers.Adam() # use the Adam optimiser with its default settings state = Optimisers.setup(rule, model); # initialise this optimiser's momentum etc. @time _, (∇model, _) = Tracker.withgradient(model, image) do m, x # calculate the gradients sum(m(x)) end; state, model = Optimisers.update(state, model, ∇model); @show sum(model(image)); Base.summarysize(∇model) end sum(model(image)) = 1.2527118f0 19.638126 seconds (39.40 M allocations: 3.444 GiB, 44.46% gc time, 87.70% compilation time) sum(model(image)) = -4792.643f0 46767520

compared to, for Zygote, this:

sum(model(image)) = 1.2527118f0 47.450042 seconds (73.94 M allocations: 5.419 GiB, 36.40% gc time, 93.23% compilation time) sum(model(image)) = -19.776657f0 46765720

But something is wrong, as the final loss differs.

Looking on the bright side, I guess with this it would be fairly easy to add checks to Flux's tests, comparing what Zygote thinks about each layer to what Tracker thinks. Any which disagree are cause for concern.

Yes, and I can already see that being helpful for Metalhead since we see the occasional odd gradient anomaly.

Diffractor, with JuliaDiff/Diffractor.jl#89

sum(model(image)) = 1.2527118f0 15.313321 seconds (35.07 M allocations: 2.822 GiB, 1.97% gc time, 98.18% compilation time) sum(model(image)) = -19.776482f0 19384064

mcabbott · 2022-08-21T20:41:46Z

Is it too weird to leave gradient returning tracked arrays and not recursing, unlike withgradient?

It would be easy to upgrade it of course, I am a bit scared of breaking things people rely on, half of which I'm quite sure is untested. Might be fairly safe, if it still returns tracked arrays?

And possibly withgradient should too, for consistency? Or would that require dozens of second derivative tests?

ToucheSir · 2022-08-21T21:00:48Z

If the worry is about things with similar names not behaving as similarly, perhaps the name could be changed? e.g. steal explicit_withgradient from that Flux PR, or (more ugly but unambigious) withgradient_unwrap. Customizing unwrapping via a boolean flag could work too. Last but not least, gating unwrapping behind a completely distinct interface like https://github.com/JuliaDiff/AbstractDifferentiation.jl/ is an option and lines up with potential longer-term goals of unifying AD interfaces.

mcabbott · 2022-08-21T23:53:25Z

Thanks for thoughts. Maybe it's OK, using this package is clearly a bit of an adventure... you cannot blindly hope arbitrary 2nd derivatives will work.

coveralls · 2024-11-05T13:37:56Z

Pull Request Test Coverage Report for Build 2899434134

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

22 of 25 (88.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.8%) to 72.141%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/back.jl	22	25	88.0%

Totals
Change from base Build 2899141545:	0.8%
Covered Lines:	492
Relevant Lines:	682

💛 - Coveralls

withgradient

f59821c

This comment was marked as off-topic.

Sign in to view

This comment was marked as duplicate.

Sign in to view

ToucheSir reviewed Aug 21, 2022

View reviewed changes

mcabbott added 3 commits August 21, 2022 13:42

drop tests on 1.3

a901a47

use _trainable_walk

5ad5a74

wtf

9f48288

ToucheSir approved these changes Aug 21, 2022

View reviewed changes

mcabbott commented Aug 21, 2022

View reviewed changes

mcabbott merged commit 7ab871f into FluxML:master Aug 21, 2022

mcabbott deleted the tree branch August 21, 2022 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add Functors-aware structural gradient #129

RFC: add Functors-aware structural gradient #129

mcabbott commented Aug 21, 2022

This comment was marked as off-topic.

This comment was marked as duplicate.

ToucheSir left a comment

ToucheSir Aug 21, 2022

mcabbott Aug 21, 2022 •

edited

Loading

ToucheSir Aug 21, 2022

mcabbott Aug 21, 2022

mcabbott Aug 21, 2022

mcabbott Aug 21, 2022

ToucheSir Aug 21, 2022

mcabbott Sep 1, 2022 •

edited

Loading

mcabbott commented Aug 21, 2022

ToucheSir commented Aug 21, 2022 •

edited

Loading

mcabbott commented Aug 21, 2022

coveralls commented Nov 5, 2024 •

edited

Loading

RFC: add Functors-aware structural gradient #129

RFC: add Functors-aware structural gradient #129

Conversation

mcabbott commented Aug 21, 2022

This comment was marked as off-topic.

This comment was marked as duplicate.

ToucheSir left a comment

Choose a reason for hiding this comment

ToucheSir Aug 21, 2022

Choose a reason for hiding this comment

mcabbott Aug 21, 2022 • edited Loading

Choose a reason for hiding this comment

ToucheSir Aug 21, 2022

Choose a reason for hiding this comment

mcabbott Aug 21, 2022

Choose a reason for hiding this comment

mcabbott Aug 21, 2022

Choose a reason for hiding this comment

mcabbott Aug 21, 2022

Choose a reason for hiding this comment

ToucheSir Aug 21, 2022

Choose a reason for hiding this comment

mcabbott Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

mcabbott commented Aug 21, 2022

ToucheSir commented Aug 21, 2022 • edited Loading

mcabbott commented Aug 21, 2022

coveralls commented Nov 5, 2024 • edited Loading

Pull Request Test Coverage Report for Build 2899434134

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

mcabbott Aug 21, 2022 •

edited

Loading

mcabbott Sep 1, 2022 •

edited

Loading

ToucheSir commented Aug 21, 2022 •

edited

Loading

coveralls commented Nov 5, 2024 •

edited

Loading