Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for improving the API #4

Closed
jannschu opened this issue Apr 5, 2020 · 14 comments
Closed

Suggestions for improving the API #4

jannschu opened this issue Apr 5, 2020 · 14 comments

Comments

@jannschu
Copy link
Contributor

jannschu commented Apr 5, 2020

Hi,

I have some suggestions to improve the API.

Motivation

  1. The residuals and jacobians closures are likely to share some computational steps. Currently it is hard to do those common computations once for both. They are performed redundantly.
  2. Because jacobians returns an iterator without a lifetime the input given as a reference must be most likely copied (and then moved).
  3. The many arguments to optimize make it hard reading the code.
  4. Failing in residuals or jacobians is not possible.
  5. Additional to Return statistics #3, the return value should also provide an easy way to decide if the optimization failed.

Suggestion

  • Merge Config and optimize into one struct LevenbergMarquardt. Simplifies usage: one argument and import less.
  • Instead of handing in closures we hand in a trait object, for example OptimizationTarget. It has methods for
    • Setting the current guess (can perform shared computations)
    • Normalize
    • Getting the residuals with option return type.
    • Getting jacobians, with a lifetime bound:
      fn jacobians<'a>(&'a self, params: &'a VectorN<..>) -> Box<dyn Iterator<Item=Option<Matrix<...>>> + 'a>;
  • The optimize method is changed to return a Result with an error trait.
  • The current API could be preserved in a new function, but I would suggest not to.

Shortcomings

  • The jacobians trait method can not easily return an iterator because of current limitations on impl Trait in rust. Alternatively, but also not perfect, we could give it an index and let it return only one jacobian at a time.
  • To use the api you would need to implement a trait and potentially define a struct. On the other hand, this could allow to use the same optimization problem for different algorithms in a consistent way.

I am willing to create a pull request for this.

@jannschu
Copy link
Contributor Author

jannschu commented Apr 7, 2020

My implementation of the API change is more or less finished. I will add some documentation.
I also added math rendering to the docs.

⚠️ I noticed something strange, though. After removing the LM call in tests/basic.rs the test still passes (on master). The initial point is already good enough. This is probably a mistake. In the new API I added a doctest using the Himmelblau function and this one fails...

I read a bit through some well-hung implementations and a text book. There are some things to consider changing in the current implemntation anyway. I will open an issue for further discussion.

edit: Here is the Himmelblau example with SciPy's LM implementation, which works. The crate implementation fails even for initial values very close to one of the minimizers. Seems there is a serious bug in the crate's implementation.

from scipy.optimize import root
import numpy as np

def f(x_k):
    [x, y] = x_k
    return np.array([x ** 2 + y - 11, x + y ** 2 - 7])

def jac(x_k):
    [x, y] = x_k
    return np.array([[2 * x, 1], [1, 2 * y]])

print(root(f, x0=[0.0,0.0], jac=jac, method='lm'))

@jannschu
Copy link
Contributor Author

jannschu commented Apr 8, 2020

Alright, I did implement some changes to the API. The progess can be found here: https://github.com/jannschu/levenberg-marquardt/tree/rework-api.

What currently still is missing is that residuals and the Jacobian computation should be able to fail.

I also extended the documentation. You can build it locally with RUSTDOCFLAGS="--html-in-header katex-header.html" cargo doc --no-deps --open.

I changed the Jacobian to be non-transposed and the residuals to be a one-dimensional vector. For the following reasons:

  1. Computing J^T * J is faster when we have J in column-major. Same applies for J^T * x.
  2. Having the residuals as a vector simplifies the API in my opinion. The vector case can be covered by providing an example. Also, residuals of different sizes are now naturally possible.
  3. I will rewrite more or less everything in order to obtain the same implementation as in lmfit, which is a C port of the "classic" MINPACK implementation. This one also seems to be identical to one described in the book "Numerical Optimization" by Nocedal and Wright. Benefits:
    • Faster because J^T * J + lambda I is not inverted for every new choice of lambda.
    • lambda is picked in a less heuristic way.
    • There will also be stopping criteria.

That's why I haven't created a pull request, yet. I will finish the implementation and then either create a pull request (if you are interested) or I will publish it as a new crate.

@vadixidav vadixidav reopened this Apr 8, 2020
@vadixidav
Copy link
Member

My bad, accidentally hit the close button.

@vadixidav
Copy link
Member

Just like the PR, I will prioritize looking at this once I am not busy. I think these suggestions seem good, but I will give a more detailed look later today or tomorrow.

@vadixidav
Copy link
Member

I think these ideas and changes are all great. Thanks for looking into the test as well. I merged the other PR you put in, but these changes sound pretty big. I will take a look at your branch now.

@vadixidav
Copy link
Member

One question I have is this: Should LM be able to fail? The reason I ask this is because if we provide a model to LM, if it cannot improve the model it should just give back the original model without any change. That is my general reasoning. Why might it fail?

@vadixidav
Copy link
Member

Also, I looked over the code you have, and I do think we need to ensure we can pass the Jacobian in pieces. The residuals we could also pass in pieces if necessary.

@jannschu
Copy link
Contributor Author

jannschu commented Apr 10, 2020

One question I have is this: Should LM be able to fail? The reason I ask this is because if we provide a model to LM, if it cannot improve the model it should just give back the original model without any change. That is my general reasoning. Why might it fail?

Giving the last parameters back is fine, but knowing that the optimization might have been unsuccessful is essential information. You might want to handle this failure, and if ignoring it is fine you still can. I think yesterday I read this article, for this argument only the first two quotes in there are relevant.

Not being to improve the parameters is one option, but cathastrophic divergence and everything is "inf" or "nan" now is also a possible outcome.

Also, I looked over the code you have, and I do think we need to ensure we can pass the Jacobian in pieces. The residuals we could also pass in pieces if necessary.

The new implementation computes the QR decomposition of the full Jacobian J, so we have to build this matrix at some point. I am aware that computing it in chunks if often more natural. So this is a tradeoff between more natural implementation for Jacobain vs. allocation and copying.

I would rather suggest to provide an example at a prominent position which shows how you can compute your Jacobian and residuals in chunks.

@vadixidav
Copy link
Member

@jannschu

I would rather suggest to provide an example at a prominent position which shows how you can compute your Jacobian and residuals in chunks

Perhaps the way I initially implemented the matrix multiplication is wrong (almost certainly), but if you take the first row of the transpose jacobian and dot (product) it by the first column of the regular jacobian, you get the top left element of the approximate hessian. Additionally, if you take the first row of the transpose jacobian and multiply/sum it against each column on the regular jacobian, you will get out a vector representing the first row vector of the approximate hessian. This process can also be broken down further. Specifically, the process is just the dot product of all pairs of column vectors in the regular jacobian, forming a symmetric matrix, and the dot product of these two vectors can also be done in pieces. That can be done by computing a chunk of the overall jacobian and then doing M (number of columns) number of piece wise multiplications where each multiplication is just one column in that chunk of the Jacobian (0..M) multiplied by all the columns in that chunk and then you sum the columns and that produces each row vector (0..M) in the approximate Hessian. All of these approximate Hessians must be accumulated to produce the final approximate hessian. This strategy has the advantage that you don't need to have all the rows of the Jacobian in memory, just a few at a time (enough for SIMD optimization).

Giving the last parameters back is fine, but knowing that the optimization might have been unsuccessful is essential information. You might want to handle this failure, and if ignoring it is fine you still can. I think yesterday I read this article, for this argument only the first two quotes in there are relevant.

As for error handling, the estimator will not produce a model if there is an error in computing the model, so by the time you run Levenberg-Marquardt you already have a complete model. It is possible that the Jacobian could be NaN, but the model itself might still be valid. My assumption is that the user passes a valid model into LM, thus so long as we tell the user why we exited, it is okay to treat that the same as all other termination criteria. Another way to look at it is that if LM fails, the original model is the best solution found thus far.

The new implementation computes the QR decomposition of the full Jacobian J, so we have to build this matrix at some point. I am aware that computing it in chunks if often more natural. So this is a tradeoff between more natural implementation for Jacobain vs. allocation and copying.

Why do we need to compute the QR decomp of the full Jacobian? If you look at the formula from Wikipedia here, the Jacobian is first multiplied by its transpose. Of course, the damping is then provided after that point, but you have a much smaller matrix. It is only at this point that we would need to compute any kind of decomposition. It might be that there is some alternative way to solve this problem that I am not aware of, but this is the way I have learned to solve it.

I can fix up the original API to use the correct approximate hessian computation and provide tests to show that doing it in pieces is equivalent to multiplying the whole matrices.

@jannschu
Copy link
Contributor Author

Why do we need to compute the QR decomp of the full Jacobian?

As I already mentioned in #4 (comment) (point 3.) I am working on porting the MINPACK reference implementation to rust. This is almost done now. The argument for doing it with the QR is also presented in the mentioned book. In short:

In the implemntation on master we change lambda, then update J^T * J, and solve the system.
The initial cost for computing J^T * J is O(mn^2), also in the current "block" fashion. Then inverting is O(n^3) every time we change lambda. Doing it with a QR (also initial cost O(mn^2)), we can solve the system for new lambda in O(n^2).

(Just to stress this: Computing the QR of the full Jacobian is as expensive as computing J^T * J)

So, options:

  1. Drop the QR way with O(n^2) instead of O(n^3)
  2. Keep the Jacobi chunks, but build the full matrix afterwards.
  3. Implement the QR step also in a block-fashion to keep the chunks of Jacobians.
  4. Drop the Jacobi chunks.

Regarding the SIMD/performance argument. I guess unless your chunks are relatively large you lose performance.

I am not sure if the memory saving is worth the trouble:

  • You get a more complicated API.
  • You are probably slower (chunks would be too small).
  • Your LM (more specifically QR) implementation is more complex.

About how much memory allocation are we talking?

@vadixidav
Copy link
Member

@jannschu

Ahh, okay, I understand now. I wasn't aware that LM could be computed in this way, so this is really exciting!

So there are a few main ways we might use this:

  • Lots of cameras and lots of points, performing bundle adjust
    • In this case we would ideally like to compute a sparse matrix (MxN) which is very large
    • We also may want to only do this on one closed loop of cameras to minimize the size of this matrix
    • My current implementation is not focused on this approach as I do not have an implementation of sparse matrices on hand
  • One camera is being optimized with a large number of matches (hundreds of thousands at max)
    • In this case we have a long matrix (MxN) where M is very large (number of points) and N is 6 (the components from se(3))
    • My thought is that the matrix might be so large that this would be prohibitive, but looking at the numbers (6 * 100,000 = 600,000), it seems I was wrong about this (it should all fit in a few MB of memory approximately)
    • This is the case I was specifically targeting with the API, although it could be used for bundle adjust as well

Looking back upon this, it seems that there is no issue with running out of memory, except for the bundle adjust case, so your algorithm should work really well for the single camera (or few camera) pose optimization problem. As for the sparse LM problem, we might want to tackle that later as well.

I see where you are going with this now and I am 100% on board 👍. I had some misunderstandings initially. Thank you for your work!

@vadixidav
Copy link
Member

@jannschu Also, I found where the partial hessian computation was first recommended. It was recommended by @mpizenberg in #1. @mpizenberg, you might want to weigh in here, but I think @jannschu has the right approach. We should just implement a separate version if we want to do large sparse matrices, right? I don't think we will run out of memory unless we create an arbitrarily large matrix by trying to bundle adjust a whole reconstruction at once and making a large sparse matrix that cant fit in contiguous memory, so this implementation should just focus on the dense case.

@mpizenberg
Copy link
Member

If I remember correctly the reason I was interested in handling residuals individually (or small groups) was because of precision when accumulating many small values. This is the strategy used in DSO for example: https://github.com/JakobEngel/dso/blob/master/src/OptimizationBackend/MatrixAccumulators.h
Otherwise I don't mind handling the Jacobian as one block if it makes other optimizations easier.

@vadixidav
Copy link
Member

I think we forgot to close this issue, as the refactor @jannschu did is complete. Closing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants