-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add generalized gamma index likelihood #497
Comments
Good suggestion Cole. I've long been interested in seeing more use of fat-tailed distributions and introduced the T-distribution for survey logL. It hasn't seen much use but with your interest in gamma we might be able to spur interest in both. We are planning a new release shortly and next would not be until early 2024. Let's move your branch into our org so evaluation will be easier. |
I have merged @Cole-Monnahan-NOAA's branch onto a new branch in the stock-synthesis repo. |
OK I'll continue to develop on my fork and do a PR when I think I'm ready for a more thorough review. Any tips on testing locally before doing that? Thanks! |
You should be able to use any test model that has an index time series. You can grab one from our test_model repo or use your own. We have a gha ready to run all our test models when we do a PR. The PR will not test your feature, but will let us know if you broke something :). What we will want is a small demo that the new feature works; some text for the User Manual; great if you can show what a useful range of model parameters would be. Rick |
@Cole-Monnahan-NOAA, another option for testing "locally" so to speak, is to use GitHub's codespaces, which gives you a linux machine, and run some of the tests that we use with GitHub actions (see GitHub Actions workflows here) on that codespace. Aside from the GitHub actions steps that are pre-built and available (such as the actions/checkout@v3, r-lib/actions/setup-r@v2, etc.), the rest of it should be able to be run in the linux terminal or from R. I have a codespace template that already has R loaded on it here if you would like to copy the devcontainer.json file to your own codespace. |
@Rick-Methot-NOAA This reminded me to push my latest changes which is good. I got it working for a biomass and numerical indices on real examples at the AFSC. One tricky bit is that the gengamma seems to not like giant values (numerically unstable) so I scaled the expected biomass by 1e-6 (here) and likewise the user has to input the mean in millions of metric tons instead of tons. We will need to think about how to deal with this on the user end. I'm hoping to return to this project in about a month and once the paper is in internal review I will devote some time to SS3 development. |
@Cole-Monnahan-NOAA I have synced the changes that you pushed to your forked repo branch to this branch in the stock-synthesis repo which will run the build github action |
Describe the solution you would like.
In many cases I believe it is more statistically justifiable to use a more flexible index likelihood than the lognormal. One appealing alternative is the generalized gamma distribution (GGD). It requires reading in a third parameter ("Q"), and has a complicated PDF function.
I forked the repo and have added a prototype on this branch and it appears to work well on the few models I've tested. It also does not appear to break backwards compatibility because the new data are only read in with the new likelihood option.
The point of this issue is to gauge the interest in this feature by the SS3 team for incorporating this into a future release of SS3. IF so, how should I proceed with development (git workflow, timing, testing, etc.)?
I think the only other main component to add is a simulator for the bootstrapper, and perhaps some reporting? I'll need help with the latter. I also need to further comment the code and update the documentation.
Describe alternatives you have considered
None so far
Statistical validity, if applicable
In many (most?) cases there is no statistical justification for assuming a lognormal index resulting from a design-based or model-based estimator. This is because sums of positive r.v.s are not necessarily lognormal, even if each r.v. is lognormal. Thus a more flexible distribution which can more accurately convey the information in the survey biomass indices is needed.
Describe if this is needed for a management application
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: