Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convergence methods for draws objects #278

Open
n-kall opened this issue Mar 24, 2023 · 19 comments
Open

Add convergence methods for draws objects #278

n-kall opened this issue Mar 24, 2023 · 19 comments
Labels
feature New feature or request

Comments

@n-kall
Copy link
Collaborator

n-kall commented Mar 24, 2023

Currently the diagnostic functions rhat, ess_tail and ess_bulk are intended to be used on a single variable matrix or rvar, e.g. as part of summarise_draws. However, they can be applied (incorrectly) to a draws object (all work on draws_array, some work on draws_df too), and then return a single numeric value which is not meaningful.

In contrast rstar takes a multivariate draws object as input and returns an interpretable value. As rhat and rstar are somewhat similar in interpretation, the different expected inputs could be confusing.

Although it is specified in the documentation that these functions expect certain input, maybe it is worthwhile doing some type checking for these functions and returning a warning / error when used incorrectly (especially when used on a draws object) and pointing to summarise_draws.

Example:

ex <- example_draws()
rhat(ex)
# 1.00185 # Not interpretable

rstar(ex)
# 2.066 # Interpretable
@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Mar 25, 2023

Perhaps, we can actually make method for draws objects that automatically compute these arguments for each variable. The default would continue to work as now but would no longer be used for draws objects.

@paul-buerkner paul-buerkner changed the title Suggestion: add a warning for univariate diagnostic functions when used on multivariate draws Add convergence methods for draws objects Mar 25, 2023
@paul-buerkner paul-buerkner added the feature New feature or request label Mar 25, 2023
@n-kall
Copy link
Collaborator Author

n-kall commented Mar 25, 2023

Making a sensible method for draws would indeed be a better option.

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 21, 2023

While working to add these methods, I thought a sensible option would be for new convergence methods such as rhat.draws(x) to simply call summarise_draws(x). I think this would work fine, except that summarise_draws passes draws_array objects to the summary functions, because of the line draws <- drop_dims_or_classes(x[, , v], dims = 3, reset_class = FALSE) in create_summary_list. From my understanding, reset_class = TRUE could be used to fix this (and pass matrix objects to the summary functions), except that variance.draws_array is expecting a draws object (otherwise it returns the variance of each chain separately, I believe).

I'm not sure what the best approach would be to get around this, but it seems that variance is the only summary function for which a draws_array method is defined, and which is then used in summarise_draws, but I'm not sure defining a new variance.matrix method (which already exists in distributional) would be sensible.

@paul-buerkner
Copy link
Collaborator

You could try to use summarize_draws but have to check how much overhead this function creates if your sole purpose is to compute a single diagnostic (say, rhat) per variable.

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 21, 2023

You could try to use summarize_draws but have to check how much overhead this function creates if your sole purpose is to compute a single diagnostic (say, rhat) per variable.

Good point. However, even if new *.draws methods do not use summarise_draws internally, there is still the issue that currently summarise_draws passes single-variable draws_array objects to the generics (and then *.default methods are used except for variance, which has a .draws_array method). Introducing new *.draws methods would interfere with this, as they would then be used by summarise_draws rather than the default methods as currently implemented.

@paul-buerkner
Copy link
Collaborator

I see your point. @mjskay do you have an idea by any chance how to deal with this case?

@mjskay
Copy link
Collaborator

mjskay commented Sep 22, 2023

Hmmm... What if summarise_draws passed a single-variable rvar down to convergence functions instead of a draws_array? draws_of(x, with_chains = TRUE) could be used in convergence functions to basically get the same format as a single variable draws_array (and that function has little overhead). Would that solve the dispatch problems?

@paul-buerkner
Copy link
Collaborator

Perhaps. It would have to be tried out, I think.

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 22, 2023

I'll try it out and make a PR if it seems to work

@paul-buerkner
Copy link
Collaborator

Thank you for trying it out. If it works, before you spend time on a PR, we perhaps need to think about if this has other consequences, we currently do not see.

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 22, 2023

The suggestion of passing rvars to the summary functions seems to work, especially as there are already .rvar methods for the summary functions implemented. However, I'm not sure that this is the best approach from a user perspective, as custom functions can be passed to summarise_draws, and currently they would expect a draws_array, so changing this could potentially break some downstream things / workflows.

@mjskay
Copy link
Collaborator

mjskay commented Sep 22, 2023

Ah yeah. I think this kind of relates to other questions about modifying summarise_draws to support things like weighted draws. See this comment: #273 (comment)

That is, if we're going to change the contract that summarise_draws has with summary functions, we should probably do it in a way that also allows us to solve the issues mentioned in the comment above. Otherwise we'll just have to do it again later.

Alternatively, summarise_draws could be the "simple" (possibly even "legacy") summary function, with few expectations on the implementation of summary functions (basically: accepts an array?) and we could create a new summary function that solves the above issues without breaking existing code.

@mjskay
Copy link
Collaborator

mjskay commented Sep 22, 2023

Another solution might be to check for an rvar implementation on the summary function generic and pass that format if present, and otherwise use an array. That way rvar support is opt-in.

Actually, this approach (essentially the summary function "declaring" its input format via a generic function implementation) could be used in the future to solve the other issues without making a new summarise_draws function. e.g. if rvar gets support for carrying weights around (which it really should), that would give us a way for summary functions to declare that they accept weights.

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 25, 2023

Another solution might be to check for an rvar implementation on the summary function generic and pass that format if present, and otherwise use an array. That way rvar support is opt-in.

I like this suggestion. When I get some more time for this I'll try it out

@mjskay
Copy link
Collaborator

mjskay commented Sep 25, 2023

Thanks! if it helps, we had to do something similar in as_draws, so there's already a utility function for checking for S3 method definitions:

if (!has_s3_method(fun_name, class(x))) {

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 27, 2023

I've played around with detecting and using rvar methods here. It seems to work ok (need to fix the issue of functions that return a named vector like quantile2 losing names). It is limited to detecting methods from functions and strings (not formula syntax) but that's probably ok. For example in summarise_draws(x, mean, "mean", ~mean(.x)) the first two mean functions will use the mean.rvar method and the third will pass an array in place of .x (matching the present behaviour).

@mjskay
Copy link
Collaborator

mjskay commented Sep 27, 2023

Yeah, good point... That could lead to counterintuitive behavior for users (and hard to track down bugs) if they start with a summary function then add arguments to it by changing to a formula or anonymous function. Not sure how I feel about that...

@n-kall
Copy link
Collaborator Author

n-kall commented Sep 28, 2023

That could lead to counterintuitive behavior for users (and hard to track down bugs) if they start with a summary function then add arguments to it by changing to a formula or anonymous function

Yes, this inconsistency could be an issue, e.g. if weights are suddenly not taken into account because of a change in syntax. Perhaps this change would really need to make rvar methods the default (if they exist) and pass array as fallback, maybe with a warning message that weights (or other attributes) are not necessarily considered. I suppose such a change would make sense only after rvars have weighting functionality, to justify a change in defaults for added functionality. But this discussion is now probably more related to issue #184.

I currently don't see an easy fix for adding .draws methods for convergence functions without messing things up, so I'll probably leave it for now and revisit when there are related developments that might make it simpler

@mjskay
Copy link
Collaborator

mjskay commented Sep 29, 2023

Yeah good point. This lends some priority to #184 then. I can take a stab at that when I next have time...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants