-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: BUG: ABNORMAL_TERMINATION test from real data #212
base: master
Are you sure you want to change the base?
Conversation
Edit: Closed by accident. I am having trouble getting the calculus to agree with the code, |
Hi @Eh2406, this is really interesting. I'm not familiar enough with the estimation code to have a good sense of what's going on, though. One thing I'm curious about is how much the non-convergence is affecting the estimation results. When there's a non-converging model and you reduce the sample size until the error disappears, do the coefficients and standard errors change or are they similar? Another thing to try could be to estimate some of the same models using PyLogit, which is a completely different codebase, and compare the results. This is a bit of a pain because the data formats and model expressions are different, but here's a notebook with an example. ("ChoiceModels engine" is essentially https://github.com/UDST/choicemodels/blob/master/notebooks/Destination-choice-models-02.ipynb I'm going to take a look and see if I can get non-convergence with some Bay Area data.. |
I believe the technically correct answer is "
That does seem like a good test. Different code, different bugs. From a brief review of the
Thank you. And thank you for taking this bug report seriously. My attempt to work thru the calculus, will be coming in the next post. |
@Eh2406 I don't test the success flag as I've found it to often be meaningless. I've seen many examples where the gradient is essentially zero (e.g. all terms being on the order of 1e-07) but the success flag says that the optimizer did not converge. I've also seen cases where the optimizer converged, as in the change in the function evaluations was less than a small threshold, yet the gradient shows that the local optimum wasn't yet reached. I've checked my gradients using automatic differentiation (which has higher accuracy than the finite difference approximations used in approx_fprime) and they've all been correct to every reported decimal place. You can also find the analytic derivations for the MNL's gradient and hessian here If you do test against PyLogit, I'd suggest directly examining the reported gradient instead of paying attention to the success flag. The gradient can be viewed by accessing the |
Indeed, I stand corrected. My previous post is overstated. It is more accurate to say that most forms of non-convergence return a reasonable approximation to the correct result. My impression is that
Automatic differentiation would be a much better test of the gradient implementation! It is immensely comforting to know you have tested PyLogit so thoroughly! Do you have advice on how to use AD to test the urbansim implementation?
Thank you! I will go read that now! My aforementioned attempts to work thru the calculus on my own have not been going well. |
Work got bizzy last week and I did not have time two dig in as much as I'd like, but hear is what I have found so far. The example reproduces with only the first column of data. The gradient calculation matches the one from @timothyb0912's paper. So maybe it has to do with the underflow checks, or something else that is not considered by that paper? Specifically |
At yesterday's meeting with @semcogli and @janowicz we discuss whether #211 would have a practical impact on our models. I thought I would run an experiment. On our computer, I updated our copy of UrbanSim to include #211, and refit all the elcm models.
Indeed many of our elcm models are reliably throwing a non-convergence warning 'ABNORMAL_TERMINATION_IN_LNSRCH'. According to the internet, that diagnostic code is usually caused by programer error. Specifically, a bad gradient calculation.
To dig in I tried to make a minimal reproducible example, I took one of our models and reduced the sample size and removed columns until the
ABNORMAL_TERMINATION
stopped. That became the newtest_mnl_estimate_temp
. It needs a better name, or even better to be in the same form as the parametric tests.With a MRE in hand I wanted to look deeper. Is the gradient calculation correct? So I added a wrapper function
mnl_loglik_print
that print the difference between the gradient returned bymnl_loglik
and the gradient as approximated byscipy.optimize.approx_fprime
. Indeed the gradient does not look correct! In fact it doesn't look good on most of the existing test.