-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#481 possibly broke ishermitian
#606
Comments
Maybe #481 should have been put in a breaking release after all. Maybe it would be safer to revert it and then prepare a breaking release with it? |
I'm sorry it broke things, but I think this is the intention. While
|
I'm with you on this, but at the same time it seems quite detrimental that we no longer can compute the gradient through methods such as EDIT: Am aware that we can just do |
I see there has been quite extensive discussions about this change, so I'll stop complaining about some of the resulting inconvenience 👍 But do we have an idea of what's the best approach to deal with all of the downstream beaking code? |
I'm with you regarding this argument. But I think it is problematic that 1) such a major change is released in a non-breaking release and breaks many downstream packages and applications and 2) it seems this is inconsistent with e.g. ChainRules which allows to compute derivatives of |
That's a lot of plurals, but so far one link. Is the problem only Are there any examples where julia> x = [1 0.5; 0.5 1]; # as above
julia> ForwardDiff.jacobian(x -> x \ [1,-2], x) # without constraining x
2×4 Matrix{Float64}:
-3.55556 1.77778 4.44444 -2.22222
1.77778 -3.55556 -2.22222 4.44444
julia> x \ [1,-2] ≈ cholesky(x) \ [1,-2]
true
julia> ForwardDiff.jacobian(x -> cholesky(x) \ [1,-2], x)
2×4 Matrix{Float64}: # before, implicit constraint
-3.55556 0.0 6.22222 -2.22222
1.77778 0.0 -5.77778 4.44444
# after, an error.
julia> ForwardDiff.jacobian(x -> Hermitian(x) \ [1,-2], x) # with explicit constraint
2×4 Matrix{Float64}:
-3.55556 0.0 6.22222 -2.22222
1.77778 0.0 -5.77778 4.44444
julia> ForwardDiff.jacobian(x -> cholesky(Hermitian(x)) \ [1,-2], x) # with explicit constraint
2×4 Matrix{Float64}: # still works after PR
-3.55556 0.0 6.22222 -2.22222
1.77778 0.0 -5.77778 4.44444 A nearby example of checks working as intended is:
|
I think it's fair to assume that this one particular test is not the only case where people are using ForwardDiff + cholesky, even if you exclude all the Turing-models using
I'm a bit uncertain what you're asking here tbh. But I think it all just comes down to the following question: do you not think we should be able to differentiate through If yes, then we need |
I went looking and found many checks of what algorithm to use based on I ask this to get an idea of the lay of the land, since probably you know this better. I have never wanted to have anything to do with
Maybe? I have not formed a very clear opinion.
Not obviously. We could also have a method like Overloading
Yes. I think the question is what should happen if this matrix (which you didn't construct yourself) is not positive definite. Do you want an error? Or is there another path in your function?
The awkward thing is that I don't think it's easy for ForwardDiff to distinguish these situations. Is this "if" statement choosing an algorithm, or checking that input is legal? The one clean way seems to be to push this to the type level. Don't check |
Maybe the cleanest and safest way would be to just fix and implement Not being able to use ChainRules (and hence also Zygote) already decided how to handle these cases. A comparison of ForwardDiff 0.10.32, 0.10.33, and ChainRules/Zygote: ForwardDiff 0.10.32: julia> using ForwardDiff, LinearAlgebra
julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
1.0 0.5
0.5 1.0
julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> f(x) = sum(cholesky(x).U);
julia> ForwardDiff.gradient(f, x)
2×2 Matrix{Float64}:
0.394338 0.42265
0.0 0.57735
julia> ForwardDiff.gradient(f, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...
julia> ForwardDiff.gradient(f, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
...
julia> g(x) = sum(cholesky(Hermitian(x)).L);
julia> ForwardDiff.gradient(g, x)
2×2 Matrix{Float64}:
0.394338 0.42265
0.0 0.57735
julia> ForwardDiff.gradient(g, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...
julia> ForwardDiff.gradient(g, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
... ForwardDiff 0.10.33: julia> using ForwardDiff, LinearAlgebra
julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
1.0 0.5
0.5 1.0
julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> f(x) = sum(cholesky(x).U);
julia> ForwardDiff.gradient(f, x)
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
...
julia> ForwardDiff.gradient(f, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...
julia> ForwardDiff.gradient(f, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
...
julia> g(x) = sum(cholesky(Hermitian(x)).L);
julia> ForwardDiff.gradient(g, x)
2×2 Matrix{Float64}:
0.394338 0.42265
0.0 0.57735
julia> ForwardDiff.gradient(g, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
...
julia> ForwardDiff.gradient(g, Hermitian(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix
... ChainRules 1.45.0: julia> using ChainRules, LinearAlgebra
julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
1.0 0.5
0.5 1.0
julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> ChainRules.rrule(cholesky, x, NoPivot())[1]
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> ChainRules.rrule(cholesky, Symmetric(x), NoPivot())[1]
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> ChainRules.rrule(cholesky, Hermitian(x), NoPivot())[1]
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> Δ = Cholesky(UpperTriangular(ones(2, 2)))
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 1.0
⋅ 1.0
julia> ChainRules.rrule(cholesky, x, NoPivot())[2](Δ)[2]
2×2 UpperTriangular{Float64, Matrix{Float64}}:
0.394338 0.42265
⋅ 0.57735
julia> ChainRules.rrule(cholesky, Symmetric(x), NoPivot())[2](Δ)[2]
2×2 Symmetric{Float64, Matrix{Float64}}:
0.394338 0.211325
0.211325 0.57735
julia> ChainRules.rrule(cholesky, Hermitian(x), NoPivot())[2](Δ)[2]
2×2 Hermitian{Float64, Matrix{Float64}}:
0.394338 0.211325
0.211325 0.57735 Zygote 0.6.49: julia> using Zygote, LinearAlgebra
julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
1.0 0.5
0.5 1.0
julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> f(x) = sum(cholesky(x).U);
julia> only(Zygote.gradient(f, x))
2×2 UpperTriangular{Float64, Matrix{Float64}}:
0.394338 0.42265
⋅ 0.57735
julia> only(Zygote.gradient(f, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
0.394338 0.211325
0.211325 0.57735
julia> only(Zygote.gradient(f, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
0.394338 0.211325
0.211325 0.57735
julia> g(x) = sum(cholesky(Hermitian(x)).U);
julia> only(Zygote.gradient(g, x))
2×2 UpperTriangular{Float64, Matrix{Float64}}:
0.394338 0.42265
⋅ 0.57735
julia> only(Zygote.gradient(g, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
0.394338 0.211325
0.211325 0.57735
julia> only(Zygote.gradient(g, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
0.394338 0.211325
0.211325 0.57735 Interestingly, FiniteDifferences is broken for all examples (errors or returns incorrect gradients). julia> using FiniteDifferences, LinearAlgebra
julia> x = [1 0.5; 0.5 1]
2×2 Matrix{Float64}:
1.0 0.5
0.5 1.0
julia> cholesky(x)
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.5
⋅ 0.866025
julia> fdm = FiniteDifferences.central_fdm(5, 1);
julia> f(x) = sum(cholesky(x).U);
julia> only(FiniteDifferences.grad(fdm, f, x))
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
...
julia> only(FiniteDifferences.grad(fdm, f, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
0.398523 0.41868
0.41868 0.579496
julia> only(FiniteDifferences.grad(fdm, f, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
0.398523 0.41868
0.41868 0.579496
julia> g(x) = sum(cholesky(Hermitian(x)).U);
julia> only(FiniteDifferences.grad(fdm, g, x))
2×2 Matrix{Float64}:
0.398523 0.41868
6.01822e-15 0.579496
julia> only(FiniteDifferences.grad(fdm, g, Symmetric(x)))
2×2 Symmetric{Float64, Matrix{Float64}}:
0.398523 0.41868
0.41868 0.579496
julia> only(FiniteDifferences.grad(fdm, g, Hermitian(x)))
2×2 Hermitian{Float64, Matrix{Float64}}:
0.398523 0.41868
0.41868 0.579496 There is an open issue that discusses issues with positive semi-definite matrices (JuliaDiff/FiniteDifferences.jl#52). |
One idea is to overload To try this out.. the example in the top message doesn't work on any version, as the function doesn't make a scalar. Here's a version of it, and comparison to finite differences: julia> y = [1 0.3; 0.3 0.7]; ishermitian(y)
true
# cholesky example
# on v0.10.33, PosDefException: matrix is not Hermitian
# with depwarn, same result as v0.10.32
julia> ForwardDiff.gradient(x -> sum(sin, cholesky(x).U), y)
┌ Warning: ishermitian(A) was true for this matrix, but will become false due to Dual numbers.
│ caller = cholesky!(A::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{var"#35#36", Float64}, Float64, 4}}, ::NoPivot; check::Bool) at cholesky.jl:296
└ @ LinearAlgebra ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/cholesky.jl:296
2×2 Matrix{Float64}:
0.16777 0.682544
0.0 0.454654
julia> ForwardDiff.gradient(x -> sum(sin, cholesky(Hermitian(x)).U), y) # silences warning
2×2 Matrix{Float64}:
0.16777 0.682544
0.0 0.454654
julia> Zygote.gradient(x -> sum(sin, cholesky(x).U), y)[1] # agrees
2×2 UpperTriangular{Float64, Matrix{Float64}}:
0.16777 0.682544
⋅ 0.454654
julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(sin, cholesky(x).U), y)[1]
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(sin, cholesky(Hermitian(x)).U), y)[1]
2×2 Matrix{Float64}:
0.16777 0.682544
1.9345e-16 0.454654 Looking for what else depends on Notice also that inserting # cos example
# on v0.10.33, MethodError: no method matching exp!(::Matrix{Complex{ForwardDiff.Dual
# with depwarn, same result as v0.10.32
julia> g2 = ForwardDiff.gradient(x -> sum(cos(x)), y)
┌ Warning: ishermitian(A) was true for this matrix, but will become false due to Dual numbers.
│ caller = issymmetric at generic.jl:1177 [inlined]
└ @ Core ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/generic.jl:1177
2×2 Matrix{Float64}:
-0.989728 -1.81858
0.0 -0.81771
julia> ForwardDiff.gradient(x -> sum(cos(Hermitian(x))), y) # doesn't work, also fails on v0.10.32
ERROR: MethodError: no method matching eigen!(::Hermitian{ForwardDiff.Dual
julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(cos(x)), y)[1]
2×2 Matrix{Float64}:
-0.989728 -0.90929
-0.90929 -0.81771
julia> (g2 + g2')./2
2×2 Matrix{Float64}:
-0.989728 -0.90929
-0.90929 -0.81771
julia> FiniteDifferences.grad(central_fdm(7, 1), x -> sum(cos(Hermitian(x))), y)[1]
2×2 Matrix{Float64}:
-0.989728 -1.81858
-8.01163e-16 -0.81771 |
That's a lot of text, note that your
No, I believe it is broken when it has
At least this is wrong for reverse-mode AD, cotangents. I am less than 100% sure it's wrong for forward mode, tangents. |
Does this work? I am a little concerned it's going to introduce ambiguities. Also not sure it covers all cases: @eval ForwardDiff begin
using LinearAlgebra: NoPivot, checksquare, checkpositivedefinite, Cholesky, BlasInt, Hermitian
function LinearAlgebra.cholesky!(A::AbstractMatrix{<:Dual}, ::NoPivot = NoPivot(); check::Bool = true)
checksquare(A)
if !ishermitian(value.(A)) # this check is quite expensive, could be done better?
@info "path 1" check
check && checkpositivedefinite(-1)
return Cholesky(A, 'U', convert(BlasInt, -1))
else
@info "path 2"
return cholesky!(Hermitian(A), NoPivot(); check = check)
end
end
end
|
The reason why I didn't mention this approach is because I was thinking the same :/ Seems doomed to introduce a bunch of ambiguities given the amount of |
Ah gotcha! |
Not a very careful check, but with the above code, I wasn't sure whether Diagonal{Dual} would be ambiguous. But it seems not to be: julia> methods(cholesky!)
# 19 methods for generic function "cholesky!" from LinearAlgebra:
...
[12] cholesky!(A::Diagonal)
@ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:818
[13] cholesky!(A::Diagonal, ::NoPivot; check)
@ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/diagonal.jl:818
[14] cholesky!(A::Diagonal, ::Val{false}; check)
@ deprecated.jl:103
[15] cholesky!(A::AbstractMatrix{<:Dual})
@ ForwardDiff REPL[19]:3
...
[18] cholesky!(A::AbstractMatrix)
@ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/cholesky.jl:294
julia> cholesky!(Diagonal(rand(3) .+ Dual(0,1)))
Cholesky{Dual{Nothing, Float64, 1}, Diagonal{Dual{Nothing, Float64, 1}, Vector{Dual{Nothing, Float64, 1}}}}
U factor:
3×3 Diagonal{Dual{Nothing, Float64, 1}, Vector{Dual{Nothing, Float64, 1}}}:
Dual{Nothing}(0.546063,0.915645) … ⋅
⋅ ⋅
⋅ Dual{Nothing}(0.909924,0.549496) |
After #481 ,
ishermitian
is now broken (maybe other methods too?) when working withForwardDiff.Dual
.This is because now the
A[i,j] != adjoint(A[j,i])
check performed inishermitian
returnstrue
.The text was updated successfully, but these errors were encountered: