Extending the tech.ml.dataset for time series #40

ezmiller · 2021-06-14T22:38:24Z

This issue is going to start out very vague and may eventually give way to some more specific issues. The problem or question here is, described most broadly, do we need to "extend" the tech.ml.dataset in some way that is especially suited to time series processing.

The best way to get into this is to consider the R tsibble library from which we have been taking inspriation. The tsibble library defines a special type of data enttity, the "tsibble`, which is like a "tibble" but with some extra constraints and features (see here). Namely:

You cannot create a tsibble unless the library can identity a time index, or you specify one manually;
a tsibble may have "keys" that identity key columns that when grouped by the index + the columns' "keys", describe unique observations; and,
when you print a tsibble you get information about the index, the keys, the time interval between the unique observations, etc.

For tablecloth.time, we think we would like to avoid defining a new "type" of dataset. It's not even clear that that is possible. It would probably take us well into a complex territory of trying to extend/override tmd's dataset and associated types. Instead, what we have is a dataset that can have an index, and that can be operated upon by a number of index aware functions. These functions try to detect the index, but simply raise an error if they cannot.

To sum up, we do not in tablecloth.time expect to define a new type and then apply constaints at the moment that this type is constructed. Instead, we think we will let the user have just the same dataset they are used to, and then when they try to use it with the tablecloth.time functions, they may be guided by our docs, the syntax of the arguments, and perhaps also by errors.

That said, there is one clear area where we do want a different kind of interaction from the dataset itself. When the user prints the dataset, we think we may need to give the user some addditional feedback about the dataset that are comparable to the tsibble. What column is operating as the time index? What is the time-interval of the time data?

But how do we do this in a library like tablecloth.time where we also do not want to create a new type of datset? What does it mean to "extend" tech.ml.dataset in into contexts where we want a different type of behavior around printing, for example?

The text was updated successfully, but these errors were encountered:

ezmiller added the question Further information is requested label Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending the tech.ml.dataset for time series #40

Extending the tech.ml.dataset for time series #40

ezmiller commented Jun 14, 2021 •

edited

Loading

Extending the tech.ml.dataset for time series #40

Extending the tech.ml.dataset for time series #40

Comments

ezmiller commented Jun 14, 2021 • edited Loading

ezmiller commented Jun 14, 2021 •

edited

Loading