-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Truncated RVs #131
base: main
Are you sure you want to change the base?
Conversation
d67ee37
to
cc14e41
Compare
Codecov ReportBase: 96.12% // Head: 96.23% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #131 +/- ##
==========================================
+ Coverage 96.12% 96.23% +0.11%
==========================================
Files 12 13 +1
Lines 1987 2100 +113
Branches 241 253 +12
==========================================
+ Hits 1910 2021 +111
Misses 39 39
- Partials 38 40 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
b8f681a
to
773dc50
Compare
982e469
to
77343ca
Compare
Okay, this seems to be working reasonably well. Ready for review |
ad22781
to
697af2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great; it only needs to be refactored to use RandomStream
s and return updates.
697af2f
to
3422b0e
Compare
3422b0e
to
ac549a5
Compare
Added RandomStream as a separate commit for review. |
Afair truncation means that you cannot observe any data and that you know that before observation. Censoring is when we don't know the value of every data point, but only that it's greater (or smaller) than a given value.
import aesara.tensor as at
srng = at.random.RandomStream(0)
x_rv = srng.normal(0, 1)
y = at.clip(x_rv, -1, 1)
|
Truncated means you cannot observe a value in a certain interval, and you wouldn't know it it occurred. The generative graph involves rejection sampling (if you want to achieve a fixed data size) or Censoring means you cannot observe a value in a certain interval but you know when that occurs. The generative process involves clipping or rounding (or any other encoding that loses resolution but not number of datapoints) There is a good pymc example that explains the difference: https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-truncated-censored-regression.html The Stan manual is also nice: |
Take prior predictive samples from the code example I shared, where do the values lie and what does this model generate? Truncation is about the generative model for your data, censoring is about how your data is observed. |
Your example code generates censored data |
To generate censored data you generate from the full distribution ( If that's not it, how do you call the process I just described? |
Are you guys essentially talking about how the endpoints of the |
Yes. In this PR are we talking about truncated random variables or random variables with a truncated distribution? There isn't a 1:1 map between these two concepts. We also call truncated random variable a variable Regarding censoring, there's an implicit assumption about the meaning of the threshold values when using |
Yes, there are some fundamental representation issues involved with using existing Aesara |
The logprob is correct (hopefully) for a data generative process that uses Back to clip, some other packages would use two vectors of data, one with the censored values and another with boolean/ categorical values that indicate whether each value was censored or not. You could try to handle this in aeppl as well. Those have slightly different meanings at the edges. In the clip case you assume edge values could only have come from censoring for continuous RVs (ignoring precision issues, it's a pdf vs pmf), and a mixture for discrete RVs (IIRC?), whereas for the double vector approach you wouldn't have to assume for continuous RVs or count as a mixture for discrete RVs because the source of the value it's disambiguated by the user. The double vector retains more information about the data generating process. Anyway, this PR was about truncation not censoring. |
You're right, will open a separate discussion for my remaining questions about censoring. This is still relevant and important here:
|
I think that still refers to censoring in the literature, not truncation. It's just one side censoring instead two side censoring. You can obtain those graphs with In general, if you can lose information about the values of your data but not the size of your data you are always talking about censoring. |
Does this means this PR is about RVs with a truncated probabilistic distribution then? |
I guess so. I am not sure what is the difference between RVs and distributions that you were mentioning above. It's definitely about truncation, not censoring. |
Ok, that's all I needed to know, and it's fine as long as it is clearly stated. I believe there is a difference between RVs with a truncated distribution (the sample space is the truncated interval), and a truncated RV which is the product of another RV (who sample space can be the real line) with an indicator function. At the very least, if equivalence there is, it is not obvious. When the terminology in the literature is so confused, here between measure-theory people and people only working with densities, it is important to be very specific about the relation between our representation and the different mathematical representation. This is not about splitting hair for the sake of it, but rather not ending up with the confusion and ambiguities that PPLs are usually ridden with. |
4e7852c
to
564569f
Compare
564569f
to
5b75e87
Compare
da00b21
to
cee8573
Compare
cee8573
to
5ef727c
Compare
As I expressed above, I have strong reservations when it comes to this PR and the meaning of the operator that is proposed. The problem can be seen clearly when comparing: import aesara.tensor as at
srng = at.random.RandomStream()
x = at.random.uniform(0, 10, name="x", size=100)
xt, _ = truncate(x, lower=5, upper=15, srng=srng) to: import aesara.tensor as at
srng = at.random.RandomStream(0)
x = srng.uniform(0, 10)
xt = at.clip(x) In the latter case, if we assume that I think that the abstractions that are present today in Aesara do not allow us to work on measure transformations since measures are not first-class citizens, but are present only implicitly. Here is a modest proposal to alleviate these limitations. In this proposal we represent e.g. the measure with a standard normal density wrt the Lebesgue measure as x_m = normal(0, 1) the x_t = truncate(normal(0, 1), lower=0) to bridge the gap with the existing x_rv = sample(rng, normal(0, 1)) here we retain the existing ambiguity there is in Aesara between random variables and elements of srng = at.random.RandomStream(0)
x_rv = srng.sample(normal(0, 1)) So the first code snippet above could be rewritten: import aesara.tensor as at
import aesara.tensor.random as ar
srng = ar.RandomStream(0)
x_m = truncate(ar.uniform(0, 10), lower=5, upper=15)
xt = srng.sample(x_m) This interface has a few nice incidental properties:
I can write an extended API proposal if you think this makes sense, and try to work out the potential challenges linked to this representation of probabilistic objects in Aesara and AePPL. |
Closes #96