Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #172

Open
willtebbutt opened this issue May 28, 2024 · 6 comments
Open

Documentation #172

willtebbutt opened this issue May 28, 2024 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@willtebbutt
Copy link
Member

willtebbutt commented May 28, 2024

The purpose of this issue is to plan out the first iteration of documentation in Tapir, and to figure out what order to do things in.

Structure plan:

Developer Docs

This is probably the most important documentation to do well, because it will make it possible to on-board other people. My basic plan is to start simple, and build up.

All of these documentations will be built around diagrams -- these will probably be hand-drawn to minimise the amount of time I have to spend messing around with annoying printing.

Explain the rrule!! abstraction #175 dealt with this

This page will explain in detail what rrule!!s and things output by build_rrule must do, with a couple of worked examples. A non-exhaustive list of points to discuss includes:

  1. why data with addresses and data without addresses require different treatment
  2. what can go wrong if we don't require that each memory address has a unique tangent memory address (i.e. all of the stuff around aliasing) -- provide a simple example
  3. a simple mathematical model for the computation that rrules do

Differentiation a Function Comprising a Single Block

With the rrule abstraction introduced, we can discuss

  1. how in principle rrules can be chained together to differentiate a composition of functions for which we have rules, and
  2. how this is done in practice.

At the end of this section the reader should have a solid understanding of what's going on with AD in Tapir.jl. In particular, we'll be able to discuss the transformation which happen to each primal line with needing to think about control flow, and will be able to discuss the basic mechanisms by which we share information between the forwards and reverse passes, how we use + generate OpaqueClosures, etc.

Handling Dynamic Control Flow and CFG Transformations

With the above done, we can turn out attention to handling dynamic control flow. The basic idea is to convince the reader that each possible trajectory through the CFG corresponds to a different linearisation of the control

At this point, we'll need to explain how control flow is handled in SSA (phi nodes, goto / goto-if-not nodes etc), and then jointly explain how

  1. each basic block gets transformed,
  2. the overall CFG gets transformed.

These two steps are necessarily somewhat coupled.

BBCode Explainer

TODO: plan this

Tutorials

TODO: plan these

Interface Documentation

TODO: plan these

Misc:

  1. Expand on argument about uniquely typed tangents giving conditional type stability
  2. Worked hand-written example with control flow
@willtebbutt willtebbutt added the documentation Improvements or additions to documentation label Jun 5, 2024
@yebai yebai pinned this issue Jul 3, 2024
@yebai
Copy link
Contributor

yebai commented Jul 22, 2024

The following paragraph is too brief for users to define new tangent types independently. Let's provide a complete example illustrating all (or more) necessary functions.

The point here is that you can manually resolve the circular dependency using a data structure which mimics the primal type. You will, however, need to implement similar methods for zero_tangent, randn_tangent, etc, and presumably need to implement additional getfield and setfield rules which are specific to this type.

https://compintell.github.io/Tapir.jl/stable/known_limitations/#Tangent-Types

@willtebbutt
Copy link
Member Author

@yebai and I discussed today that a side-by-side comparison of Tapir.rrule!! with ChainRules.rrule should be included somewhere in the docs.

@RoyCCWang
Copy link

Great work!

The section on Algorithmic Differentiation in the documentation is a long page, with only first-level headings in the navigation panel. It might be helpful to allow second or third-level headings in the navigation panel.


There might be a typo in the current documentation (v0.4.50)

For the first example:
Search for the sentence

We can read off the adjoint operator from the first argument to the inner product:

The line after that sentence reads YX' + XY.
I think it should be XY' + XY. It might be helper to the reader to add a sentence to review the cyclic argument identity trace(ABC) = trace(BCA) = trace(CAB).

See this script for an example, for the RHS (what I think is correct) and RHS_in_docs (what is currently there).

using LinearAlgebra, Random

rng = Random.Xoshiro(0)
T = Float64

D = 3

Y = randn(T, D, D) # a tangent vector in the tangent space of the destination manifold.
X = randn(T, D, D) # a point on the source matrix manifold
V = randn(T, D, D) # a tangent vector in the tangent space of the source manifold.

ip = (AA,BB)->tr(AA'*BB)
LHS = ip(Y, V'*X + X'*V)
RHS = ip(X*Y' + X*Y, V)

RHS_in_docs = ip(Y*X' + X*Y, V)

@show norm(LHS-RHS)/norm(LHS), norm(LHS-RHS_in_docs)/norm(LHS)

I get this in the REPL on my machine:

(norm(LHS - RHS) / norm(LHS), norm(LHS - RHS_in_docs) / norm(LHS)) = (2.533751899901873e-16, 1.6039683080608973)

Please let me know if I got this example wrong.

@willtebbutt
Copy link
Member Author

willtebbutt commented Nov 26, 2024

The section on Algorithmic Differentiation in the documentation is a long page, with only first-level headings in the navigation panel. It might be helpful to allow second or third-level headings in the navigation panel.

Thanks for the suggestion -- I wasn't aware that this was possible! I'll look into it.

I think it should be XY' + XY.

I think you must be right that I have a typo on the basis of the dimensions of X and Y alone, because if X is an N x D matrix, then Y must be a D x D matrix . So Y X' must be a D x N matrix, and XY must be an N x D matrix, so YX' + XY makes no sense at all. I think I have a mistake on the third line of the derivation when I apply the trace's cyclic property + transpose the first trace term.

It might be helper to the reader to add a sentence to review the cyclic argument identity trace(ABC) = trace(BCA) = trace(CAB).

Great idea.

For the typo and adding the cyclic trace property reminder, would you be interested in opening a PR, or shall I go ahead and make the fix?

@RoyCCWang
Copy link

Hi Will, I can open a PR but it needs to wait until this weekend or later. Please go ahead and add a PR without me if you have an upcoming release and would like to incorporate this change.

If you can wait for this weekend, would you be open to me adding some other minor writing style or grammar-related updates in the PR? You're welcomed to keep only the edits that make sense to you when reviewing the PR.

@willtebbutt
Copy link
Member Author

I'm happy to wait until the weekend :)

If you can wait for this weekend, would you be open to me adding some other minor writing style or grammar-related updates in the PR? You're welcomed to keep only the edits that make sense to you when reviewing the PR.

Please do, I look forward to your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants