Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DifferentiationInterface supports Tapir! #107

Closed
gdalle opened this issue Mar 31, 2024 · 10 comments
Closed

DifferentiationInterface supports Tapir! #107

gdalle opened this issue Mar 31, 2024 · 10 comments

Comments

@gdalle
Copy link
Collaborator

gdalle commented Mar 31, 2024

Now that Tapir is registered, I have added it to the list of officially supported backends:

https://github.com/gdalle/DifferentiationInterface.jl

The associated code is in this file:

https://github.com/gdalle/DifferentiationInterface.jl/blob/main/ext/DifferentiationInterfaceTapirExt/DifferentiationInterfaceTapirExt.jl

A few questions for you:

  • For my test cases, I cannot escape some unfortunate conversions when the tangent has a different time than the primal value (typically when I backpropagate basis vectors during a jacobian computation). Is there a better way to deal with that?
  • Since we never differentiate wrt the function f itself, should I use zero_codual in the out-of-place value_and_pullback as well? I think I tried it and Tapir was unhappy.

Missing features:

  • second order tests (I don't know if it works or not)
  • suppot for mutating functions f!(y, x)

Do you want to advertise DifferentiationInterface.jl as a user-friendly interface to Tapir for the time being? It is not registered yet but I think it might soon be

cc @adrhill

@yebai
Copy link
Contributor

yebai commented Mar 31, 2024

Do you want to advertise DifferentiationInterface.jl as a user-friendly interface to Tapir for the time being? It is not registered yet but I think it might soon be

I don't see a reason why not. DI can be used together with LogDensityProblemsAD which is geared towards differentiating through probabilistic density functions.

I'll leave other questions to @willtebbutt

@willtebbutt
Copy link
Member

For my test cases, I cannot escape some unfortunate conversions when the tangent has a different time than the primal value (typically when I backpropagate basis vectors during a jacobian computation). Is there a better way to deal with that?

Ahh interesting. Yes, Tapir is very precise about the types it uses to represent (co)tangents. Do you have a list of types that you are interested in supporting, or some general conventions for (co)tangent types in DifferentiationInterface? That might be a good place to start if we're thinking about how to interface the two.

Since we never differentiate wrt the function f itself, should I use zero_codual in the out-of-place value_and_pullback as well? I think I tried it and Tapir was unhappy.

You should just pass the function I think. The kind of thing I'm imagining you mean is something like

Tapir.value_and_gradient!!(rrule!!, sin, 5.0)

for which you'll get

(-0.9589242746631385, (NoTangent(), 0.28366218546322625))

If it fits with your interface, I would recommend just returning the element of the gradient tuple onwards.

Is this the kind of example that you have in mind?

second order tests (I don't know if it works or not)

We don't presently support second-order AD, and adding support is also not yet on our roadmap :) I should probably make that more clear in the readme.

suppot for mutating functions f!(y, x)

Agreed -- we should definitely have support for this in the interface, as support for mutating functions is one of Tapir's interesting features. Can I ask -- what is currently preventing Tapir from handling mutating functions when used from DI?

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 1, 2024

Do you have a list of types that you are interested in supporting, or some general conventions for (co)tangent types in DifferentiationInterface?

No, we try to be as agnostic as possible in that regard. But the typical case where it fails is the computation of a Jacobian, where we backpropagate basis arrays of tangent type FillArrays.OneElement regardless of the primal type

Is this the kind of example that you have in mind?

Yes, this currently works, but I'm wondering if it saves time to mark f as non-differentiable.

We don't presently support second-order AD, and adding support is also not yet on our roadmap :) I should probably make that more clear in the readme.

That's the thing though, we do, and it is entirely built from first-order operators. So the real question is whether Tapir is able to differentiate over its own differentials.

Can I ask -- what is currently preventing Tapir from handling mutating functions when used from DI?

Mostly the fact that I haven't tried it, I'll let you know how it goes

@willtebbutt
Copy link
Member

No, we try to be as agnostic as possible in that regard.

Understood.

But the typical case where it fails is the computation of a Jacobian, where we backpropagate basis arrays of tangent type FillArrays.OneElement regardless of the primal type

Ahhh I see. Yeah, we'll definitely have to set up custom conversions for this.

For context: Tapir insists that each primal type (say, Vector{Float64}) has a unique (co)tangent type (in this case, also Vector{Float64}) for a couple of reasons (mainly promises about composition and type-stability -- I'll get into these properly in the docs when I write them). Moreover, this mapping is specified by the function Tapir.tangent_type.

My initial thoughts are that something along the following lines might make sense:

# default definition
function convert_to_tapir_tangent(::Type{primal_type}, tangent::T) where {primal_type, T}
    if Tapir.tangent_type(primal_type) == T
        # the tangent type is already what Tapir requires
        return tangent
    else
        # the tangent type isn't what Tapir needs
        throw(error("An informative error message"))
    end
end

# specific conversions
function convert_to_tapir_tangent(::Type{Vector{Float64}}, tangent::FillArrays.OneElement{Float64})
    return collect(tangent)
end

I'm not sure if this is entirely what you need -- I guess this is kind of vaguely similar to your zero_sametype!! function -- but it makes of tangent_type, which is what's needed to ensure that something works with Tapir.

Note that tangent_type should always be performant (it's often an @generated method), so if performance is a concern, it should be fine.

Yes, this currently works, but I'm wondering if it saves time to mark f as non-differentiable.

Ahh I see. At the minute we don't support activity analysis, except insofar as primals which are non-differentiable always have NoTangent tangents, which is basically the same thing. This is something that's on my todo list, but it's a ways off yet.

So I think my original answer stands. If a user is using a callable struct as the function argument, they will wind up differentiating w.r.t. its fields. This might hit performance, but it shouldn't hit correctness.

That's the thing though, we do, and it is entirely built from first-order operators. So the real question is whether Tapir is able to differentiate over its own differentials.

Ahhh okay. Currently no, because OpaqueClosures are a little bit awkward in this regard. Unfortunately, I would treat this as a thing which is unlikely to work any time soon. Thanks for asking about this though!

Mostly the fact that I haven't tried it, I'll let you know how it goes

Cool -- I look forward to seeing how this winds up looking!

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 1, 2024

If a user is using a callable struct as the function argument, they will wind up differentiating w.r.t. its fields.

But what if I mark this callable struct as zero_codual, or whatever the Tapir equivalent of Enzyme.Const is`

Currently no, because OpaqueClosures are a little bit awkward in this regard. Unfortunately, I would treat this as a thing which is unlikely to work any time soon.

What's cool about DI is that we can mix backends for second-order though. The most efficient way to compute a Hessian is forward-over-reverse, so we're free to test any forward outer backend with Tapir as inner backend, and if the forward backend can differentiate through this OpaqueClosure it should work! Maybe Enzyme can pull it off?

@willtebbutt
Copy link
Member

But what if I mark this callable struct as zero_codual, or whatever the Tapir equivalent of Enzyme.Const is`

For a primal of type P, zero_codual just produces something of type CoDual{P, tangent_type(P)}. So, for example`

zero_codual(ones(5))

will produce something like

CoDual(ones(5), zeros(5))

It's just saying "initialise the tangent bit of the codual to zero", as opposed to "this thing will always be zero".

Tapir does not presently have an equivalent of Enzyme.Const.

What's cool about DI is that we can mix backends for second-order though. The most efficient way to compute a Hessian is forward-over-reverse, so we're free to test any forward outer backend with Tapir as inner backend, and if the forward backend can differentiate through this OpaqueClosure it should work! Maybe Enzyme can pull it off?

Perhaps -- would be interesting to know.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 1, 2024

Would you mind taking a look at JuliaDiff/DifferentiationInterface.jl#126 ? This is my first shot at mutation with Tapir and I don't understand what fails

@yebai
Copy link
Contributor

yebai commented Apr 1, 2024

What's cool about DI is that we can mix backends for second-order though. The most efficient way to compute a Hessian is forward-over-reverse, so we're free to test any forward outer backend with Tapir as inner backend, and if the forward backend can differentiate through this OpaqueClosure it should work! Maybe Enzyme can pull it off?

I'd be curious to see whether DI can make it work. One important use case for packages like DI is to facilitate interoperability between ADs. Of course, DI's benchmarking and testing also help new AD tools like Tapir mature faster.

IIUC, Zygote's second-order derivative also uses ForwardDiff over Zygote's reverse diff.

@yebai
Copy link
Contributor

yebai commented Apr 1, 2024

@gdalle, do you have any plan to test GPU-compatibility for AD backends?

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 1, 2024

One important use case for packages like DI is to facilitate interoperability between ADs.

Indeed, and this is especially true for higher-order where you need to combine forward with reverse. The right place for this kind of stuff seems to be outside of first-order AD backends, for instance in DI.

IIUC, Zygote's second-order derivative also uses ForwardDiff over Zygote's reverse diff.

True, it is the setting of Zygote.hessian, which is reasonable in most cases. And you can also choose reverse over reverse with Zygote.hessian_reverse.

@gdalle, do you have any plan to test GPU-compatibility for AD backends?

I am not at all familiar with GPU computations, and I don't think GitHub offers GPU servers for free to run our CI, so it's hard to test the real thing.
For now, what we do is we offer a list of scenarios involving GPU-like arrays from JLArrays.jl.
We also test them on some backends, but not all.
The testing code like this, you can do the same with Tapir.jl:

https://github.com/gdalle/DifferentiationInterface.jl/blob/3bfcd1f2dc005936c8defe7ca335953f20401713/test/weird_arrays.jl#L14-L19

Like type-stability, support for weird arrays is a "bonus", in the sense that not all backends provide it. Hence I'm unsure how best to test it in the long run, because if we test it for every backend

  • most tests will fail
  • it will take ages

Any suggestions are welcome!

@yebai yebai closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants