Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide how to handle gaps in java.time support in tech.v3.datatype.datetime #38

Open
ezmiller opened this issue Jun 1, 2021 · 1 comment

Comments

@ezmiller
Copy link
Contributor

ezmiller commented Jun 1, 2021

tech.v3.datatype.datetime has a narrow list of java.time classes it supports. My understanding is that the reasons for this are mixture of principle and practicality. There's a general tendency in tech.datatype's datetime support to avoid awareness of the distinct time classes (unlike e.g. the tick time stack). There's are also java.time classes such as java.time.Year that one might arguably better treat as just a number. Then also adding support in functions like descriptive-statistics and tech.v3.datatype.functional for all these different classes is a lot of work.

That said, it can feel as though there are gaps in tech.v3.datatype's datetime support. For example, you cannot do this:

(tech.v3.datatype.datetime/plus-temporal-amount #time/year "1970" (range 10) :years)

although it might feel natural to do so. You get an error that reads:

Data datatype (:year) is not a date time datatype.

which could be confusing, since intuitively someone might think it should be, even if they don't know that java.time.Year is a class.

In tablecloth.time, we want to make a very smooth experience for beginners, so this is a problem for us. Right now, a clear solution doesn't present itself and more experimentation is probably needed. Each of the classes that tech.datatype does not support may provide unique problems, as well.

Some of those are:

  • java.time.Year
  • java.time.YearMonth
  • java.time.LocalTime

And then also extra classes provided by org.threeten such as YearQuarter.

@ezmiller
Copy link
Contributor Author

ezmiller commented Jul 1, 2021

After a chat with @cnuernber , I think we can think of it in this way for now:

They key idea: Because we have chosen to follow the philosophy of tech.v3.datetype.datetime, which tries to minimize the use and awareness of distinct types (read: classes) of time, we should try to not use distinct types for these classes.

From there we can think of this in a prioritized way:

  1. In short term, we can encourage the use of two rows, both numbers, to manage year-months or year-quarters.
  2. Eventually, we may be able to extend tech.ml.dataset to support these types. @cnuernber described an approach that could be explored that would manage year months in terms of epoch-months:

Perhaps the datetime system in datatype needs to be extensible to new temporal types and then I think working through making year-month or something like that work would be good but you would be in pure datetime land. For year, month you could just use epoch-months and have a single integer that you then built more operations (such as get-year, etc) from. Then you have +,-,<,>, etc basic operations working, serialization to arrow or parquet, etc.

In an ideal system you could get all of that by defining a conversion from year-month to epoch-month and lots of things would 'just work' but that is a serious type engineering of the type you need a better type resolution system than anything I wrote in dtype next.

Re #1, the question may be:

  • What kind of documentation/methods do we need to make this feel intuitive and easy
  • Will this work with indexing (i.e. can we slice) when we are representing the time index across two columns. This may be a use-case for a multi-column index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant