Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sid/datetime convertor functions #42

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

rsiddharthan
Copy link
Contributor

Goal / Problem

We depend on dtype-next datatypes, including datetime representations for the data in datasets. As the datetime datatype uses integer values for datetimes, we have a gap in modeling higher order granularity in time, such as quarter, month, week, etc.

Proposed Solution

The approach considered here is to treat the higher order time objects as dates, by setting it to the last date of that time period.

So, "2020-Q1" is represented as the local date "2020-03-31", as that is the last-date of the quarter. Last date of the period is just a convention that we are adopting.

Similarly, it is easy to see that "2020-Mar" will be "2020-03-31". We leave it to the user to differentiate quarter vs month aspects of a date "2020-03-21".

The key advantage with this approach is that we can leverage all the features of a datetime datatype, including the ability to automatically index on such columns.

Work remaining

  • We need helper functions to translate easily from higher order time data to local date and vice-versa.
  • We need to extend year-quarter to year-month and year-week (as those use cases show up)

Open Questions

At some point, we need to look for alternatives for the above approach. One such possibility is to deconstruct year-quarter into a tuple of two integers for year and quarter, with each value as a column in the data set. The indexing will need a multiple column index solution.

Copy link
Contributor

@ezmiller ezmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsiddharthan I made some comments and change requests, based on what we dicussed today.

Comment on lines 185 to 195
(defn ->year
[localdate]
(.getYear ^java.time.LocalDate localdate))

(defn ->year-month
[localdate]
(let [yyyy (.getYear ^java.time.LocalDate localdate)
mmm (-> (.getMonth ^java.time.LocalDate localdate)
(.getDisplayName ^java.time.format.TextStyle TextStyle/SHORT
(^java.util.Locale Locale/getDefault)))]
[yyyy mmm]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsiddharthan if we were going to support these types, in order to be consistent with the conversion api these conversion pathways would need to go through the convert-to method, which supports universal conversion. i.e. the fn signatured would be (->year any-datetime), and ->year would be a convenience method on top of convert-to.

However, per our discussion today that we will try not supporting the types Year, YearQuarter, and YearMonth these should be removed I think.

@@ -20,6 +22,11 @@
(defn year->local-date [^Year year]
(-> year (.atMonthDay (java.time.MonthDay/parse "--01-01"))))

(defn year-quarter->local-date [year-quarter format-pattern]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsiddharthan let's make the format-pattern optional, and by default support the string format that org.threeten.extra.YearQuarter/parse accepts.

In terms of the location of this function, I think it can make sense to include it in this namespace, but maybe put it at the end with a comment explaining that it (and presumably its partner year-month->local-date) are special helpers to help us work with these special types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving format-pattern as optional is simple to do, and can default to threeten conventions.

(defn year-quarter->local-date [year-quarter format-pattern]
(-> year-quarter
(YearQuarter/parse (DateTimeFormatter/ofPattern format-pattern))
.atEndOfQuarter))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that our convention should be instead to use the start of the quarter. Could be convinced otherwise, but that make more intuitive sense to me. If I think of the date representation of a quarter and a month, I wouldn't think of the end of the period...

Or do you have a principled reason for choosing the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to support both begin and end, but did not see the begin implementation. This is another reason why we should move to the 2nd option when doable - and we can remove dependencies on the threeten conventions.

I am thinking of refactoring so it fits with the approach you quoted above, but really treat it as a hack to move forward

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to support both begin and end, but did not see the begin implementation.

For these, see here and here

I am thinking of refactoring so it fits with the approach you quoted above, but really treat it as a hack to move forward

To which approach are you referring here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you are referring to the atDay method as that puzzled me and was not sure what that means as the period spans 3 months. I was looking for the equivalent of .atEndOfQuarter for the beginning. Obviously we can shift the previous quarter value by 1 day, but did not have appetite to do that. So, I remained at end of period as a easy convention.

Refactoring from convertors so these "time objects that are higher than date" data types are available in utils. In particular, move the bi-directional conversion methods for year-quarter (both string->local-date, and local-date->string)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you are referring to the atDay method as that puzzled me and was not sure what that means as the period spans 3 months...

(.atDay some-year-quarter 1) just gives you the first day of the each quarter. E.g.

(-> "2021-Q1"
    org.threeten.extra.YearQuarter/parse
    (.atDay 1))
;; => #time/date "2021-01-01"

(-> "2021-Q2"
    org.threeten.extra.YearQuarter/parse
    (.atDay 1))
;; => #time/date "2021-04-01"

;; etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants