Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuously-Valued Timeseries #21

Open
dglmoore opened this issue Jul 29, 2016 · 0 comments
Open

Continuously-Valued Timeseries #21

dglmoore opened this issue Jul 29, 2016 · 0 comments

Comments

@dglmoore
Copy link
Contributor

This is a discussion post. Please feel free to comment and contribute to the discussion even if you are not directly involved in the development of inform or its wrapper libraries.

The Problem

The various information measures are really designed around discrete-valued timeseries data. In reality, most data are continuous in nature, and up to this point our go-to approach is to bin.

At this point we've implemented several binning procedures (see 1355d68). Binning works fine for some problems (e.g. if the system has a natural threshold), but when it is applied artificially it can introduce hefty bias. The problem gets worse when you attempt to compare two different timeseries. Should they be binned in the same way, e.g. uniform bin sizes, specific number of bins, etc...?

Possible Solutions

All of the information measures are built around probability distributions. The timeseries measures simply construct empirical probability distributions and call an information measure on the distribution. "All" that must be done to accommodate continuously-valued distributions is to attempt to infer the distribution from the data.

Machine learning is more or less built around inferring probability distributions and then making some sort of decision from that. Consequently there are easily dozens of algorithms for inferring distributions from continuously-valued observations. One simple example of such an algorithm, kernel density estimation, has been around since the 50's.

Usefulness

This would likely be useful to @dglmoore and @colemathis as the systems that we deal with are either continously-valued, or are so discrete that treating as continuous is more memory efficient than otherwise. Would this be useful to anyone else? If so we can prioritize it over some of the other new features that we are considering.

Acknowledgments

The JIDT project, written and maintained by the estimable Joe Lizier, implements such an approach. The work produced a paper which describes the three inference algorithms they've implemented.

Also, thank you @hbsmith and @colemathis for pointing out JIDT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant