Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in CVAfindIndex #31

Open
aseyboldt opened this issue Feb 26, 2020 · 2 comments
Open

Segmentation fault in CVAfindIndex #31

aseyboldt opened this issue Feb 26, 2020 · 2 comments

Comments

@aseyboldt
Copy link

aseyboldt commented Feb 26, 2020

I am using the adjoint sensitivity analysis functionality of sunodes, and sporadically I get segmentation faults during the backward pass. Unfortunately I could not reproduce this with a small example so far, but the coredump seems to indicate to me, that CVAfindIndex tries to access checkpoints that do not exist relatively near t0 of the forward problem, in a region where the solver is making (ridiculously?) small steps.
Screenshot from 2020-02-26 11-18-52
It seems that CVAfindIndex is trying to find a checkpoint for t = 161.33623519238427 but the largest of the 600 entries in ca_mem->dt_mem has only t = 161.33623519238293.

The details of how I'm using sundials are somewhat hidden in a python wrapper and pymc3 (I'm sampling the parameter space with an hamiltonian sampler), but here is a rough outline of what I'm doing:

  • Initialize forward and backward solvers with polynomial interpolation and checkpoints every 600
  • Repeat (a lot):
    • Change user_data
    • Call CVodeReInit and CVodeAdjReInit
    • Run forward solver
    • Call CVodeReInitB, CVodeQuadReInitB and CVodeBsolve repeatedly, as the adjoint rhs is not continuous.

The t of the segfault is nowhere near the discontinuities of the rhs, the first one of those is at t ~ 12000.

The source for the solver calls is here: https://github.com/aseyboldt/sunode/blob/master/sunode/solver.py#L365

I can also provide the coredump if that is helpful.

@aseyboldt
Copy link
Author

I think I figured out what the problem here seems to be:
Let's assume there is only one backward problem.
At the beginning of the loop that advances all the backward problems (here), ck_mem is initialized so that ck_mem->ck_t0 < cvB_mem->cv_mem->cv_tn < ck_mem->ck_t1.

The solver sets ck_mem->ck_t0 as stop time (here) and advances the backward problem. If the solver reached that stop time (so cvB_mem->cv_tout == ck_mem->ck_t0), then cvB_mem->cv_mem->cv_tn will still be larger than the stop time by a small amount, since it (incorrectly in this case) assumes it can not compute the rhs at the stop time itself.
In the next step after advancing the checkpoint, the invariant from above will not be true anymore, and CVStep will continue at cv_tn, so that CVfindIndex will access out-of-bounds memory (here) when looking for a step with t >= ck_mem->ck_t1.

Wouldn't it be better to compute a few more points when re-integrating the forward problem so that the checkpoint data sections overlap slightly? Then the solver would not have to integrate right up to the stop time in all but the last checkpoint sections. That might also lower interpolation errors somewhat I guess.

@aseyboldt
Copy link
Author

@balos1 Not sure who to ping, I hope this is alright.
I just ran into an example where I think this bug leads to silently incorrect results. I'd really appreciate it if someone who knows the code could have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants