Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending the tech.ml.dataset for time series #40

Open
ezmiller opened this issue Jun 14, 2021 · 0 comments
Open

Extending the tech.ml.dataset for time series #40

ezmiller opened this issue Jun 14, 2021 · 0 comments
Labels
question Further information is requested

Comments

@ezmiller
Copy link
Contributor

ezmiller commented Jun 14, 2021

This issue is going to start out very vague and may eventually give way to some more specific issues. The problem or question here is, described most broadly, do we need to "extend" the tech.ml.dataset in some way that is especially suited to time series processing.

The best way to get into this is to consider the R tsibble library from which we have been taking inspriation. The tsibble library defines a special type of data enttity, the "tsibble`, which is like a "tibble" but with some extra constraints and features (see here). Namely:

  • You cannot create a tsibble unless the library can identity a time index, or you specify one manually;
  • a tsibble may have "keys" that identity key columns that when grouped by the index + the columns' "keys", describe unique observations; and,
  • when you print a tsibble you get information about the index, the keys, the time interval between the unique observations, etc.

For tablecloth.time, we think we would like to avoid defining a new "type" of dataset. It's not even clear that that is possible. It would probably take us well into a complex territory of trying to extend/override tmd's dataset and associated types. Instead, what we have is a dataset that can have an index, and that can be operated upon by a number of index aware functions. These functions try to detect the index, but simply raise an error if they cannot.

To sum up, we do not in tablecloth.time expect to define a new type and then apply constaints at the moment that this type is constructed. Instead, we think we will let the user have just the same dataset they are used to, and then when they try to use it with the tablecloth.time functions, they may be guided by our docs, the syntax of the arguments, and perhaps also by errors.

That said, there is one clear area where we do want a different kind of interaction from the dataset itself. When the user prints the dataset, we think we may need to give the user some addditional feedback about the dataset that are comparable to the tsibble. What column is operating as the time index? What is the time-interval of the time data?

But how do we do this in a library like tablecloth.time where we also do not want to create a new type of datset? What does it mean to "extend" tech.ml.dataset in into contexts where we want a different type of behavior around printing, for example?

@ezmiller ezmiller added the question Further information is requested label Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant