-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide examples of back-propagation #1137
Comments
@vograno haven't tried. The internal |
Adding to what @skrawcz said: Heh, fascinating idea. I've had a similar idea but never really considered it. Some thoughts (feel free to disagree on these points!)
As @skrawcz said -- it would also require rewalking the graph, at least backwards. What I'd do if you're interested is first build a simple 2-node graph out of pytorch objects, which have the relationship. Then you can POC it out. You can compute gradients individually, and expand to more ways of specifying nodes/gradients. I'd also consider other optimization routines if the goal is flexibility! |
I can walk the graph backward all right, but I also need to create new nodes in the back-flow graph along the way, and this is where I'm not sure. I can see two options, but first note:
Option 1 - using temp module.
Option 2. |
Let me propose a less fascinating, but simpler example to work with.
The goal is to compute the back-flow driver given the forward-flow one. |
@vograno sounds cool. Unfortunately we're a little bandwidth constrained at the current time to be as helpful as we'd like. So just wanted to mention that. Do continue to add context / questions here - we'll try to help when we can. |
A dag defines a forward flow of information. At the same time it implies a back-flow of information when we reverse each edge on the DAG and associate a back-flow node function with each node. Additionally, we should also provide a merging function that merges back-flow inputs that connect to the common forward output, but this is a technicality.
The gradient descent of a feed-forward neural network is an example of such back-flow. Namely, the forward pass computes node outputs and gradients given the model parameters, while the back pass updates model parameters according to the computed gradients. I think, the merging function is
sum
in this case.The question is then whether Hamilton is an appropriate framework for inferring the back flow DAG from the forward one. Here inferring means compute the back-flow driver given the forward-flow one.
Use gradient descent as a study case.
The text was updated successfully, but these errors were encountered: