Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double free or corruption (out) error #341

Closed
LukasBarner opened this issue Nov 21, 2022 · 20 comments
Closed

double free or corruption (out) error #341

LukasBarner opened this issue Nov 21, 2022 · 20 comments

Comments

@LukasBarner
Copy link
Contributor

LukasBarner commented Nov 21, 2022

Hi everyone,
I am solving a series of MPECs, where a relaxation on complementary slackness conditions is tightened every iteration. The model code works as expected most of the time, however i do sometimes get strange memory related errors with Ipopt and MA97. Likely, they are not related to the julia interface, but I think it might be best to start here and make my way downstream. I have attached a log of the last unsuccessful iteration below.
The error is likely related to some numerically odd constellation, as the 20 successful runs before that did use the same code and only had different values.
Do you think there is a better way to dig into this than just by creating a more detailed log?
If not, I will try to come up with a more detailed report, but this might take some time as the model needs to run all previous iterations again...
Any help is much appreciated, thanks in advance :D

This is Ipopt version 3.14.4, running with linear solver ma97.


Detected 9770 linearly dependent equality constraints; taking those out.

Number of nonzeros in equality constraint Jacobian...:  2816768
Number of nonzeros in inequality constraint Jacobian.:   516442
Number of nonzeros in Lagrangian Hessian.............:   897470

Total number of variables............................:   687301
                     variables with only lower bounds:   643410
                variables with lower and upper bounds:      327
                     variables with only upper bounds:     4894
Total number of equality constraints.................:   460287
Total number of inequality constraints...............:   219364
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:   219364

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  4.3068916e-08 1.31e-06 5.01e-04  -9.0 0.00e+00    -  0.00e+00 0.00e+00   0 
double free or corruption (out)

signal (6): Aborted
in expression starting at /net/work/barner/GGM/GGM_main_calib_toy_small.jl:107
@odow
Copy link
Member

odow commented Nov 21, 2022

Do you have a reproducible example? How are you calling Ipopt?

@LukasBarner
Copy link
Contributor Author

The code that produces the error comes from a longer procedure. It involves a little data processing, running a few models to get approx. solutions / warmstarts to the lower level optimization problem, and then all of the iterations in the iterative solution procedure. The bilevel code in principal is similar to the following PR: joaquimg/BilevelJuMP.jl#184 (see https://github.com/joaquimg/BilevelJuMP.jl/blob/b0160b788bbe0e22dd7ce21b113cbb596f79e06d/docs/src/examples/Iterative_example1.jl for a simple example). Ipopt is also called like this (MOI), and the first iterations (sometimes all) run through perfectly fine.
I could share the full error producing repo if this helps, but setting up a MWE is not possible as the error is highly input specific. For example, if I specify different iter_eps values (basically a different RHS to the complementary slackness conditions, but still in the same order of magnitude), the whole procedure runs through without any errors.
Getting to the error will take a few hours/days of computation on a cluster computer though, which makes the situation even more impractical...
I could try to produce a full log file, but this will surely be a mess :D

@odow
Copy link
Member

odow commented Nov 21, 2022

It's going to be hard, if not impossible, to debug this without a reliable reproducible example.

@LukasBarner
Copy link
Contributor Author

Is there a good way to create a single reproducible model run? Maybe something like writing the MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer} to disk for every iteration before calling MOI.optimize!(). Following a run on the cluster computer, I could share the iteration that produces an error. I was thinking something like serialization or jld, but not sure if that could potentially work?

@odow
Copy link
Member

odow commented Nov 21, 2022

In theory, MOI.write_to_file(model, "model.nl"). But there could be any number of reasons why that wouldn't work. Mainly because the model we create by reading the file is not bit-for-bit similar to the one you have.

Does it happen with a solver other than MA97?

@LukasBarner
Copy link
Contributor Author

I think .nl files order variables (see https://www.ampl.com/wp-content/uploads/Hooking-Your-Solver-to-AMPL-by-David-M.-Gay.pdf). Since the problem seems related to some numerical situation, this will likely make a difference (?). Nevertheless, I will try...
It seems like MOI.copy_to() does not work for MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer} because:

ERROR: MathOptInterface.GetAttributeNotAllowed{MathOptInterface.ListOfModelAttributesSet}: Getting attribute MathOptInterface.ListOfModelAttributesSet() cannot be performed: Ipopt.Optimizer does not support getting the attribute MathOptInterface.ListOfModelAttributesSet(). You may want to use a `CachingOptimizer` in `AUTOMATIC` mode or you may need to call `reset_optimizer` before doing this operation if the `CachingOptimizer` is in `MANUAL` mode.

Is this intended (and is there another way except copying to MOI.FileFormats.Model())?
I have figured out a way around this, but I'm not 100% sure it won't mix things up a bit...

I did also send out another run with MA86. Usually different linear solvers produce different solutions/trajectories, so if it comes back without problems we cannot infer the error is related to MA97. But if we're lucky it will also produce an error. Then we could at least exclude the linear solvers from the list of likely candidates (they could still both have a similar issue, but this seems less likely...). Unfortunately, we will have to wait a bit for the results...

@odow
Copy link
Member

odow commented Nov 22, 2022

Ah. You probably need to use MOI.instantiate(Ipopt.Optimizer; with_bridge_type=Float64) as the solver so that it is built with a cache.

@LukasBarner
Copy link
Contributor Author

In my case, this is how MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer} is instantiated (see https://github.com/joaquimg/BilevelJuMP.jl/blob/565c0ef6d5fd07ae7ff558bbf2466b87e815caf9/src/jump.jl#L829-L854). Do you have another idea?
My way of working around this was to copy a MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.AbstractOptimizer, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}} to MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer} in every iteration instead of directly reusing it (which arguably might make some difference...).

@LukasBarner
Copy link
Contributor Author

I can then write the CachingOptimizer to a file

@odow
Copy link
Member

odow commented Nov 22, 2022

I don't understand. Where did MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer} come from?

Are you using BilevelJuMP or just MOI and Ipopt?

If you're using MOI, then use model = MOI.instantiate(Ipopt.Optimizer; with_bridge_type=Float64) as your model.

@LukasBarner
Copy link
Contributor Author

LukasBarner commented Nov 22, 2022

Sorry, that was a bit confusing. I'm using an extended version of BilevelJuMP (that was the PR I had linked here: #341 (comment)). There, the solver is instantiated like this: optimizer=MOI.instantiate(optimizer_constructor; with_bridge_type = Float64) and trying to copy it does not work.
The following MWE also produces the same error on my machine:

using MathOptInterface
using Ipopt
const MOI = MathOptInterface
optimizer = MOI.instantiate(Ipopt.Optimizer; with_bridge_type = Float64)
dest = MOI.FileFormats.Model(; filename = joinpath(pwd(),"tst_logs","tst.nl"))
MOI.copy_to(dest, optimizer)
MOI.write_to_file(dest, joinpath(pwd(),"tst_logs","_model.nl"))

The error message is:

ERROR: MathOptInterface.GetAttributeNotAllowed{MathOptInterface.ListOfModelAttributesSet}: Getting attribute MathOptInterface.ListOfModelAttributesSet() cannot be performed: Ipopt.Optimizer does not support getting the attribute MathOptInterface.ListOfModelAttributesSet(). You may want to use a `CachingOptimizer` in `AUTOMATIC` mode or you may need to call `reset_optimizer` before doing this operation if the `CachingOptimizer` is in `MANUAL` mode.
Stacktrace:
 [1] get_fallback(model::Ipopt.Optimizer, attr::MathOptInterface.ListOfModelAttributesSet)
   @ MathOptInterface ~/.julia/packages/MathOptInterface/Ht8hE/src/attributes.jl:406
 [2] get(::Ipopt.Optimizer, ::MathOptInterface.ListOfModelAttributesSet)
   @ MathOptInterface ~/.julia/packages/MathOptInterface/Ht8hE/src/attributes.jl:390
 [3] get(b::MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer}, attr::MathOptInterface.ListOfModelAttributesSet)
   @ MathOptInterface.Bridges ~/.julia/packages/MathOptInterface/Ht8hE/src/Bridges/bridge_optimizer.jl:790
 [4] copy_to(dest::MathOptInterface.FileFormats.NL.Model, model::MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer})
   @ MathOptInterface.FileFormats.NL ~/.julia/packages/MathOptInterface/Ht8hE/src/FileFormats/NL/NL.jl:260
 [5] top-level scope
   @ Untitled-1:5

with:

MathOptInterface v1.10.0
Ipopt v1.1.0

Edit: forgot the const MOI line...

@LukasBarner
Copy link
Contributor Author

@odow Should the code above work, or am I approaching this from the wrong side?

@odow
Copy link
Member

odow commented Nov 23, 2022

Try:

optimizer = MOI.Utilities.CachingOptimizer(
    MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}()),
    MOI.instantiate(Ipopt.Optimizer; with_bridge_type = Float64),
)

@LukasBarner
Copy link
Contributor Author

Ok, so this is probably the best I can do...

My way of working around this was to copy a MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.AbstractOptimizer, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}} to MathOptInterface.Bridges.LazyBridgeOptimizer{Ipopt.Optimizer} in every iteration instead of directly reusing it (which arguably might make some difference...).

Will set this up and hopefully get back with a reproducible example...

@LukasBarner
Copy link
Contributor Author

Ok, I did a bit of testing on this and there might be a problem with MOI and .nl files.
The attached script works fine for primal variables, but ignores dual starts. I did also take a look at the MOI code for .nl files and could not find anything about ConstraintDualStart() there. Did I miss something?

using MathOptInterface
using Ipopt

src = MOI.FileFormats.Model(format = MOI.FileFormats.FORMAT_NL)

MOI.read_from_file(src, joinpath(pwd(),"_model_storage", "model.nl"))

solver = MOI.instantiate(Ipopt.Optimizer; with_bridge_type = Float64)

MOI.copy_to(solver, src)
MOI.set(solver, MOI.RawOptimizerAttribute("warm_start_init_point"), "yes")
MOI.set(solver, MOI.RawOptimizerAttribute("warm_start_bound_push"), 1e-12)
MOI.set(solver, MOI.RawOptimizerAttribute("warm_start_bound_frac"), 1e-12)
MOI.set(solver, MOI.RawOptimizerAttribute("warm_start_slack_bound_frac"), 1e-12)
MOI.set(solver, MOI.RawOptimizerAttribute("warm_start_slack_bound_push"), 1e-12)
MOI.set(solver, MOI.RawOptimizerAttribute("warm_start_mult_bound_push"), 1e-12)
MOI.set(solver, MOI.RawOptimizerAttribute("mu_init"), 1e-12)
MOI.set(solver, MOI.RawOptimizerAttribute("print_level"), 5)

MOI.optimize!(solver)

@odow
Copy link
Member

odow commented Nov 24, 2022

I did also take a look at the MOI code for .nl files and could not find anything about ConstraintDualStart() there

I don't think we support dual starts in the NL files yet.

@LukasBarner
Copy link
Contributor Author

I can try to write this up the next few days.

@LukasBarner
Copy link
Contributor Author

I can try to write this up the next few days.

Think I was a bit optimistic here...
.nl files are pretty messy to me and dual starts even more so. For example, I have no clue how to manage things like duals to variable bounds...

But instead, I managed to write an extension of the MOF format that correctly stores primal and dual starts.
On smaller test cases, the Ipopt runs appear to be reproducible...

If leaving out starts was not a design choice for MOF, I could also do a PR with the amendments to MOI.

@odow
Copy link
Member

odow commented Nov 26, 2022

Think I was a bit optimistic here... .nl files are pretty messy to me and dual starts even more so.

😆 I'm not surprised. NL files are pretty cryptic!

If leaving out starts was not a design choice for MOF, I could also do a PR with the amendments to MOI.

Not a design choice. Just something I didn't get around to. Please open a PR.

We'll also have to make changes to the schema: https://github.com/jump-dev/MathOptFormat

The place to add is somewhere:
https://github.com/jump-dev/MathOptFormat/blob/67e65785623330af60f7bbf2eab7f48d4580f322/schemas/mof.1.1.schema.json#L87-L107
but if you open a PR with your suggestion in MOI, I can show you how to change the schema

@LukasBarner
Copy link
Contributor Author

Closing this, I believe it is related to dlopen() when handling linear solvers.
Recently also got a segfault when using MA97 that was actually caused by the pardiso shared library...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants