Wishart priors resulting in `PosDefException: matrix is not positive definite; Cholesky factorization failed` #2188

ipozdeev · 2024-04-10T06:08:01Z

julia v1.10.2
Turing v0.30.7
Distributions v0.25.107

Using Wishart priors results in several functions throwing the above error, which does not make sense to me, e.g. in this MWE maximizing:

using Turing, MCMCChains
using Statistics, LinearAlgebra, PDMats
using Optim

# parameter of the Wishart prior
A = Matrix{Float64}(I, 3, 3);
isposdef(A)  # true
ishermitian(A)  # true

@model function demo(x)
    _A ~ Wishart(5, A);
    _x_mu = sum(_A);
    return x ~ Normal(_x_mu, 1);
end

# condition model on single obs
demo_model = demo(1.0);

map_estimate = optimize(demo_model, MAP());  # error
chain = sample(model, HMC(0.05, 10), 1000);  # error

chain = sample(model, MH(), 1000);  # no error

MAP() throws an error, as does sampling with HMC and NUTS, but not with MH.

The text was updated successfully, but these errors were encountered:

yebai · 2024-05-30T12:12:31Z

Likely related to TuringLang/Bijectors.jl#313

yebai · 2024-05-30T12:12:42Z

cc @sethaxen @torfjelde

yebai · 2024-06-05T10:57:59Z

The above example now works with the most recent Bijectors.jl release. A small step size is useful for stable optimisation or HMC sampling:

julia> chain = sample(demo_model, HMC(0.01, 10), 10000); 
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00

I am not sure how to specify step size for optimisation. @mhauru is that possible?

yebai · 2024-06-05T12:42:47Z

These numerical errors can be explicitly catched and used to inform the inference backend to reject the proposal.

julia> @model function demo(x)
           try 
                 _A ~ Wishart(5, A);
                 _x_mu = sum(_A);
                 x ~ Normal(_x_mu, 1); 
          catch e; 
                 if e isa PosDefException
                        Turing.@addlogprob! -Inf; 
                 end 
          end
       end


julia> chain = sample(demo(1), NUTS(), 1000)
┌ Info: Found initial step size
└   ϵ = 0.0125                                                                                    |  ETA: N/A
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00
Chains MCMC chain (1000×21×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 0.19 seconds
Compute duration  = 0.19 seconds
parameters        = _A[1, 1], _A[2, 1], _A[3, 1], _A[1, 2], _A[2, 2], _A[3, 2], _A[1, 3], _A[2, 3], _A[3, 3]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64

    _A[1, 1]    3.6504    2.3821    0.0935   546.6195   372.3050    1.0003     2907.5507
    _A[2, 1]   -1.4789    1.7435    0.0766   565.4885   588.1340    1.0066     3007.9174
    _A[3, 1]   -1.5504    1.7307    0.1053   254.4582   458.9476    1.0133     1353.5010
    _A[1, 2]   -1.4789    1.7435    0.0766   565.4885   588.1340    1.0066     3007.9174
    _A[2, 2]    3.5547    2.2049    0.0896   549.1068   560.1294    1.0012     2920.7810
    _A[3, 2]   -1.4334    1.6267    0.0781   466.9718   495.1435    0.9991     2483.8925
    _A[1, 3]   -1.5504    1.7307    0.1053   254.4582   458.9476    1.0133     1353.5010
    _A[2, 3]   -1.4334    1.6267    0.0781   466.9718   495.1435    0.9991     2483.8925
    _A[3, 3]    3.5786    2.2005    0.1018   556.3600   493.4034    1.0026     2959.3615

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

    _A[1, 1]    0.5643    1.8723    3.1353    4.8230    9.5504
    _A[2, 1]   -5.6961   -2.3031   -1.2180   -0.3144    1.1457
    _A[3, 1]   -5.8425   -2.3965   -1.2686   -0.4025    1.0603
    _A[1, 2]   -5.6961   -2.3031   -1.2180   -0.3144    1.1457
    _A[2, 2]    0.5728    1.9093    3.1736    4.6654    8.8305
    _A[3, 2]   -5.1991   -2.3551   -1.2366   -0.3020    1.2692
    _A[1, 3]   -5.8425   -2.3965   -1.2686   -0.4025    1.0603
    _A[2, 3]   -5.1991   -2.3551   -1.2366   -0.3020    1.2692
    _A[3, 3]    0.7432    1.9730    3.1545    4.6346    9.2545

mhauru · 2024-06-05T14:25:47Z

I am not sure how to specify step size for optimisation. @mhauru is that possible?

Depends on the optimisation algorithm, but if the algorithm has a notion of a step size, then usually yes. The default algorithm is LBFGS, which first finds a direction to go in and then does a line search along that direction to figure out how far to go, so there isn't a fixed step size, but you can set an initial guess for the step size like this:

optimize(demo_model, MAP(), Optim.LBFGS(;alphaguess=0.01));

That seems to help avoiding the loss of positivity errors in this case.

sethaxen · 2024-06-05T14:34:02Z

@yebai I don't think anything I did fixing the correlation bijectors would have fixed this. I'm not at a computer right now, but I imagine the problem is similar; the Jacobian computation is going through a cholesky decomposition, which is wasteful and can randomly fail due to floating point errors. The solution is to make the same fix for the covariance bijector.

The other place a PosDefException would be raised randomly is if one used the Wishart matrix as the covariance of an MvNormal. The solution there is the same as LKJ: add a WishartCholesky to Distributions.jl and a VecCovBijector to Bijectors.jl. Same goes for InverseWishart.

yebai · 2024-06-05T14:45:52Z

Thanks @sethaxen, for the clarification. For now, users can be referred to #2188 (comment) before numerically more stable alternatives are implemented. We should also update docs to include some guides on how to use try-catch block to handle numerical exceptions.

yebai added the doc label Jun 5, 2024

yebai mentioned this issue Jun 6, 2024

covariance matrix cannot be estimated using Wishart with NUTS, HMC or HMCDA samplers #1861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wishart priors resulting in `PosDefException: matrix is not positive definite; Cholesky factorization failed` #2188

Wishart priors resulting in `PosDefException: matrix is not positive definite; Cholesky factorization failed` #2188

ipozdeev commented Apr 10, 2024

yebai commented May 30, 2024

yebai commented May 30, 2024

yebai commented Jun 5, 2024 •

edited

Loading

yebai commented Jun 5, 2024

mhauru commented Jun 5, 2024

sethaxen commented Jun 5, 2024

yebai commented Jun 5, 2024

Wishart priors resulting in PosDefException: matrix is not positive definite; Cholesky factorization failed #2188

Wishart priors resulting in PosDefException: matrix is not positive definite; Cholesky factorization failed #2188

Comments

ipozdeev commented Apr 10, 2024

yebai commented May 30, 2024

yebai commented May 30, 2024

yebai commented Jun 5, 2024 • edited Loading

yebai commented Jun 5, 2024

mhauru commented Jun 5, 2024

sethaxen commented Jun 5, 2024

yebai commented Jun 5, 2024

Wishart priors resulting in `PosDefException: matrix is not positive definite; Cholesky factorization failed` #2188

Wishart priors resulting in `PosDefException: matrix is not positive definite; Cholesky factorization failed` #2188

yebai commented Jun 5, 2024 •

edited

Loading