Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval adjusment: handle case where first element is missing #22

Open
ezmiller opened this issue Apr 3, 2021 · 1 comment
Open

Interval adjusment: handle case where first element is missing #22

ezmiller opened this issue Apr 3, 2021 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ezmiller
Copy link
Contributor

ezmiller commented Apr 3, 2021

In our current implementation of adjust-interval (See #14), we use the first item in a column to determine the target-datatype of the time unit to which we are converting. More specifically, adjust-interval takes a ->new-time_converter fn that the user supplies, and calls that function on the first item in the targeted column to get the new unit, and then uses tech.v3.datatype/elemwise-datatype to determine the unit's keyword.

This is all fine, but @cnuernber raised a good point that we overlooked:

There are some auto-detection routines for datatype that rely on converting the first element. All I might add to that is you may want to convert the first non-missing element; what if your first element is a missing/null value?

We should figure out a way to handle this case. It might also pay to generalize the process of determining the time datatype from the row if this is going to be a more common practice.

@ezmiller ezmiller added enhancement New feature or request good first issue Good for newcomers labels Apr 3, 2021
@ezmiller
Copy link
Contributor Author

ezmiller commented Aug 1, 2021

When we added the index structure to tech.ml.datset (see techascent/tech.ml.dataset#214), we prevented a column from returning an index if there are missing values in the column here. So this issue may not be relevant any more. There should always be an item in the first position because the column should not have any missing values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant