Gradients for Flux etc.-- WIP #59

mcabbott · 2019-01-09T00:36:15Z

I believe I found the bug in my gradient calculation today, and so I tidied it up a little for this PR. (See also #47.) The code specific to Flux is loaded via Requires, while the gradients themselves are in another file.

This does not attempt to handle gradients of the scalars α,β, only the arrays. I would like α::TrackedReal to be an error, which I think will require doing slightly less promotion, but I don't precisely see where yet.
Everything to do with the cache could use a close look. Since trace∇A calls contract!, it might be helpful if trace! was handed some syms by the macro, as contract! is; they could be ignored elsewhere.
I copied & pasted formulae for Zygote, and the example commented out in tests does actually run for me (once I have IRTools#master) but others give errors.
It should be easy to similarly support Knet, again with the same gradients.

coveralls · 2019-01-09T01:10:55Z

Coverage decreased (-1.7%) to 73.717% when pulling 51e59f6 on mcabbott:gradients into 0388ba2 on Jutho:master.

codecov · 2019-01-09T01:56:29Z

Codecov Report

Merging #59 into master will decrease coverage by 1.73%.
The diff coverage is 53.17%.

@@            Coverage Diff             @@
##           master      #59      +/-   ##
==========================================
- Coverage   77.15%   75.41%   -1.74%     
==========================================
  Files          13       16       +3     
  Lines        1383     1509     +126     
==========================================
+ Hits         1067     1138      +71     
- Misses        316      371      +55

Impacted Files	Coverage Δ
src/gradients/zygote.jl	`0% <0%> (ø)`
src/TensorOperations.jl	`92.98% <100%> (+0.25%)`	⬆️
src/gradients/backwards.jl	`59.61% <59.61%> (ø)`
src/gradients/flux.jl	`72.34% <72.34%> (ø)`
src/implementation/stridedarray.jl	`72.51% <0%> (+1.89%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d32e9e...ce18832. Read the comment docs.

codecov · 2019-01-09T01:56:29Z

Codecov Report

Merging #59 into master will decrease coverage by 1.49%.
The diff coverage is 54.47%.

@@            Coverage Diff            @@
##           master      #59     +/-   ##
=========================================
- Coverage   75.52%   74.02%   -1.5%     
=========================================
  Files          13       16      +3     
  Lines        1344     1467    +123     
=========================================
+ Hits         1015     1086     +71     
- Misses        329      381     +52

Impacted Files	Coverage Δ
src/gradients/zygote.jl	`0% <0%> (ø)`
src/TensorOperations.jl	`77.77% <100%> (+2.77%)`	⬆️
src/gradients/backwards.jl	`59.61% <59.61%> (ø)`
src/gradients/flux.jl	`72.34% <72.34%> (ø)`
src/implementation/stridedarray.jl	`72.51% <0%> (+1.89%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0388ba2...9f1b50b. Read the comment docs.

Jutho · 2019-01-09T21:21:45Z

While this looks like great work you are doing, I would like to inform/warn you that I don't have a lot of time to review this in the next few weeks, especially since I am also pretty new to automatic differentation.

I have a keen interest in this topic, and I am open to supporting it with TensorOperations.jl, but I would actually have hoped that some of the more specific compatibility code could actually live in the corresponding packages, not here. Having it here means also a much higher responsibility from my part to keep it up and running, and might make future innovations that I would like to implement a lot harder.

For example, I am not convinced that the way I currently lower the @tensor macro into an explicit expression is the way to go. Maybe an actual computation graph, similar to how broadcast works, can more easily be maintained. It would also allow for certain optimizations, i.e. if you contract a tensor network, you can recycle many of the intermediates when taking the derivative with respect to a certain tensor (see https://arxiv.org/pdf/1310.8023.pdf).

mcabbott · 2019-01-23T21:48:16Z

I dropped the ball here, sorry... but at least you know there is no rush! Maybe RFC would have been a better title for this. It was basically a challenge to see if I could put this together, and now it works, more or less. I learned quite a bit, including from your code.

My PR suggestion that it live here was I guess because it has to depend on lots of internal details of your functions, not just what’s “public”. But I hear you about maintenance load, and don’t mean to impose on you. It also necessarily depends on many things in Flux etc. which look pretty different from how they did a year ago… it’s a bit awkward really that it must do both, maybe there will eventually be a nicer way.

Indeed there may be smarter ways of going about this whole thing. I haven't thought very hard about this, it could be that gradients should be defined either on a higher or a lower level than these functions. That was just what seemed within reach -- 5 minutes on a piece of paper for these 3 basic operations.

For now perhaps I leave it here & we can see? If it's useful to anyone perhaps they could chime in too.

GiggleLiu · 2019-02-28T20:57:53Z

@mcabbott Thanks for you PR, it solved my problem, so far every thing works fine!
I will provide more “stress test” for this PR in the near future.

mcabbott · 2019-08-10T12:08:42Z

This code is now at https://github.com/mcabbott/TensorTrack.jl instead, and a higher-level approach to the same problem is at https://github.com/mcabbott/TensorGrad.jl .

Jutho · 2019-08-10T23:00:33Z

I know that reference and have wanted to implement this in TensorOperations.jl for a long time already. In a sense, I consider that reference as a first step towards AD for tensor networks 'avant la lettre'. In fact, I started working on an implementation last week and had thought a bit about the interface. I was thinking of something like

@tensor X[a,b,c](A,B) = A[a,d,e]*B[b,d,f] * C[c,e,f]

which would create a callable object X of a type that could be named, say TensorNetwork. Upon calling, the tensors A and B as arguments, but C would need to be determined in the scope where the macro was called, and will be stored in its data structure using named tuples, in such a way that you can still evaluate the same contraction with a different C once, by saying
X(A, B; C = ...) or you could permanently change C as X.C = ....

This would work perfectly with the gradients from your TensorGrad.jl , so I was wondering if you were interested in joining forces on this @mcabbott ? I have yet to figure out how to store and recycle intermediate objects, but that should not be too hard.

MacKenzieHnC · 2020-10-12T01:11:39Z

What's the status on this? This would save me a lot of hell if it worked. I'm happy to devote any amount of time to getting this working

mcabbott · 2020-10-13T14:08:03Z

The gradient definitions from TensorGrad.jl (linked above) are presently bolted onto Tullio.jl. It's perhaps a slightly awkward combination but useful for now.

BTW, the design there is that there's a callable struct Eval to which the Zygote gradient is attached. This contains a forward & a reverse function, but it's not returned, it's simply called once, behind the usual syntax. Gradients are calculated for all arrays unless you say otherwise in a keyword argument, like @tullio A[a,d,e] * B[b,d,f] * C[c,e,f] nograd=(A,C).

maartenvd · 2020-10-31T17:18:52Z

@mcabbott is there a reason you abandoned defining gradients for contract! and instead went with an @Grad macro?

GiggleLiu · 2021-11-28T21:53:19Z

UPDATES:

@Jutho , It is probably much easier to define the backward rules on binary tensor contraction operations. Because the nary tensor operations calls the binary contraction function, the contraction order in the backward pass can be determined automatically by the AD engine. It always has constant overheads w.r.t. the forward pass.

FYI: It is much easier to support AD now, because the AD community has switched to ChainRules, you just add a very small dependency: ChainRulesCore. (however, we still do not have a reliable AD engine in Julia 😞 )

Michael Abbott added 2 commits January 9, 2019 01:17

gradients, commit no.1

d131963

rm Z tests

9f1b50b

zygote nograd

51e59f6

Merge branch 'master' into gradients

ce18832

GiggleLiu mentioned this pull request Mar 26, 2019

Einsum FluxML/Flux.jl#299

Closed

mcabbott closed this Aug 10, 2019

mcabbott deleted the gradients branch August 10, 2019 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradients for Flux etc.-- WIP #59

Gradients for Flux etc.-- WIP #59

mcabbott commented Jan 9, 2019 •

edited

Loading

coveralls commented Jan 9, 2019 •

edited

Loading

codecov bot commented Jan 9, 2019 •

edited

Loading

codecov bot commented Jan 9, 2019

Jutho commented Jan 9, 2019

mcabbott commented Jan 23, 2019

GiggleLiu commented Feb 28, 2019 •

edited

Loading

mcabbott commented Aug 10, 2019

Jutho commented Aug 10, 2019 •

edited

Loading

MacKenzieHnC commented Oct 12, 2020

mcabbott commented Oct 13, 2020

maartenvd commented Oct 31, 2020

GiggleLiu commented Nov 28, 2021 •

edited

Loading

Gradients for Flux etc.-- WIP #59

Gradients for Flux etc.-- WIP #59

Conversation

mcabbott commented Jan 9, 2019 • edited Loading

coveralls commented Jan 9, 2019 • edited Loading

codecov bot commented Jan 9, 2019 • edited Loading

Codecov Report

codecov bot commented Jan 9, 2019

Codecov Report

Jutho commented Jan 9, 2019

mcabbott commented Jan 23, 2019

GiggleLiu commented Feb 28, 2019 • edited Loading

mcabbott commented Aug 10, 2019

Jutho commented Aug 10, 2019 • edited Loading

MacKenzieHnC commented Oct 12, 2020

mcabbott commented Oct 13, 2020

maartenvd commented Oct 31, 2020

GiggleLiu commented Nov 28, 2021 • edited Loading

mcabbott commented Jan 9, 2019 •

edited

Loading

coveralls commented Jan 9, 2019 •

edited

Loading

codecov bot commented Jan 9, 2019 •

edited

Loading

GiggleLiu commented Feb 28, 2019 •

edited

Loading

Jutho commented Aug 10, 2019 •

edited

Loading

GiggleLiu commented Nov 28, 2021 •

edited

Loading