Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Walkthrough entropy #70

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Walkthrough entropy #70

wants to merge 13 commits into from

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Aug 25, 2022

What is this PR?

This PR implement the walkthrough entropy (Stoop et al, 2021) for a symbol sequence x.

Walkthrough entropy is the first step in implementing excess entropy (#69), which is just a normalized and averaged version of walkthrough entropy, but is a useful method in itself - hence this PR.

Excess entropy will be pursued in another PR. The reason for this is that I'm having some issues understanding the implementation of the normalization step (commented in the _walkthrough_entropy function docstring). I will investigate this further and submit a PR when ready (if you @Datseris or someone else has any input, feel free to comment).

Interface

  • walkthrough_entropy(x, n) computes the walkthrough entropy for x at position(s) n, where 1 <= n <= length(x).

Internal changes

  • Implemented a generic EntropyGenerator struct with a corresponding entropygenerator(x, method, [, rng]) (like we do in TimeseriesSurrogates). Why? Walkthrough entropy is a function of the position n, but if not having a generator, we'd need to do initial calculations (histogram estimation) multiple times, which grows linearly with length(x).
  • For completeness, other methods should implement entropygenerator(x, method, [, rng]) too, but I haven't done so yet, before getting some feedback on this approach.
  • Added another histogram method vec_countmap(x) which returns both the unique elements of x and their frequencies. The element type of the frequencies can be customized (defaults to BigInt, because that is needed for binomial calculations with large n/N for the walkthrough entropy to avoid overflow).

(Currently) Unused files

  • walkthrough_prob.jl is currently unused, but is implemented for completeness for reproducing Stoop et al. (2021). it is conceivable that these methods become useful in some future algorithm. Note: the factorials quickly blow up. Should only be used for experimentation.

Potential future improvements

  • Use specialized methods for BigInt calculations. These are quite slow and allocates a lot at the moment.

Testing

The original paper doesn't provide any concrete examples to test on, so tests are generic. However, in the documentation example, I successfully reproduce the basic examples in Figure 1 from Stoop et al. (2021).

References

  • Stoop, R. L., Stoop, N., Kanders, K., & Stoop, R. (2021). Excess entropies suggest the physiology of neurons to be primed for higher-level computation. Physical Review Letters, 127(14), 148101.

@codecov
Copy link

codecov bot commented Aug 25, 2022

Codecov Report

Merging #70 (dd10654) into main (3c95c69) will increase coverage by 1.81%.
The diff coverage is 91.83%.

@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   79.32%   81.13%   +1.81%     
==========================================
  Files          21       25       +4     
  Lines         624      721      +97     
==========================================
+ Hits          495      585      +90     
- Misses        129      136       +7     
Impacted Files Coverage Δ
src/walkthrough/walkthrough_entropy.jl 87.75% <87.75%> (ø)
src/histogram_estimation.jl 95.91% <92.00%> (-0.09%) ⬇️
src/api.jl 100.00% <100.00%> (ø)
src/walkthrough/walkthrough.jl 100.00% <100.00%> (ø)
src/walkthrough/walkthrough_prob.jl 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kahaaga kahaaga requested a review from Datseris August 25, 2022 08:19
@Datseris
Copy link
Member

I'll need to review this in detail before we merge, but a detail review may take a month or two...

@kahaaga
Copy link
Member Author

kahaaga commented Aug 25, 2022

I'll need to review this in detail before we merge, but a detail review may take a month or two...

No pressure! I'm just tagging you for review here so you're aware of the PR.

I will probably be able to figure out the missing normalization step in the meanwhile, so that the excess entropy estimator is also included by that time.

@kahaaga
Copy link
Member Author

kahaaga commented Dec 29, 2023

After the package redesign, it is clear that the walkthrough entropy isn't really an entropy - but is is an information measure of some sort according to our definition, since it is a function of probabilities. It is possible to create an OutcomeSpace which follows the reasoning in the paper and to define counts over the outcomes, and this outcome space is worth having. However, it isn't clear to me how these counts would relate to the probabilities that they use in the paper to define their "walkthrough entropy". I have to dig a bit deeper.

This PR will be modified accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants