-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Walkthrough entropy #70
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #70 +/- ##
==========================================
+ Coverage 79.32% 81.13% +1.81%
==========================================
Files 21 25 +4
Lines 624 721 +97
==========================================
+ Hits 495 585 +90
- Misses 129 136 +7
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I'll need to review this in detail before we merge, but a detail review may take a month or two... |
No pressure! I'm just tagging you for review here so you're aware of the PR. I will probably be able to figure out the missing normalization step in the meanwhile, so that the excess entropy estimator is also included by that time. |
After the package redesign, it is clear that the walkthrough entropy isn't really an entropy - but is is an information measure of some sort according to our definition, since it is a function of probabilities. It is possible to create an This PR will be modified accordingly. |
What is this PR?
This PR implement the walkthrough entropy (Stoop et al, 2021) for a symbol sequence
x
.Walkthrough entropy is the first step in implementing excess entropy (#69), which is just a normalized and averaged version of walkthrough entropy, but is a useful method in itself - hence this PR.
Excess entropy will be pursued in another PR. The reason for this is that I'm having some issues understanding the implementation of the normalization step (commented in the
_walkthrough_entropy
function docstring). I will investigate this further and submit a PR when ready (if you @Datseris or someone else has any input, feel free to comment).Interface
walkthrough_entropy(x, n)
computes the walkthrough entropy forx
at position(s)n
, where1 <= n <= length(x)
.Internal changes
EntropyGenerator
struct with a correspondingentropygenerator(x, method, [, rng])
(like we do in TimeseriesSurrogates). Why? Walkthrough entropy is a function of the positionn
, but if not having a generator, we'd need to do initial calculations (histogram estimation) multiple times, which grows linearly withlength(x)
.entropygenerator(x, method, [, rng])
too, but I haven't done so yet, before getting some feedback on this approach.vec_countmap(x)
which returns both the unique elements ofx
and their frequencies. The element type of the frequencies can be customized (defaults toBigInt
, because that is needed for binomial calculations with largen
/N
for the walkthrough entropy to avoid overflow).(Currently) Unused files
walkthrough_prob.jl
is currently unused, but is implemented for completeness for reproducing Stoop et al. (2021). it is conceivable that these methods become useful in some future algorithm. Note: the factorials quickly blow up. Should only be used for experimentation.Potential future improvements
BigInt
calculations. These are quite slow and allocates a lot at the moment.Testing
The original paper doesn't provide any concrete examples to test on, so tests are generic. However, in the documentation example, I successfully reproduce the basic examples in Figure 1 from Stoop et al. (2021).
References