Problems with Pandas time formatting when ARM files do not start at 0000 UTC. #418
-
Pandas, and dateutil parser incorrectly handles ARM time because it does not parse the units string correctly. Xarray uses the pandas Timestamp function which I think defaults to iso8601 standards which ARM does not follow. I should note that this is only an issue when we are trying to decode the time into an actual datetime in python, which xarray defaults to. The issue boils down into how we indicate the time zone. As an example with the corkazrcfrgeM1.a1 code the units are stored as time:units = "seconds since 2019-04-02 22:00:02 0:00" ; however, when read in using xarray (which uses pandas) or parsed using dateutil, the output is Timestamp('2019-04-02 00:00:00') Which puts the time back to 00 utc. Since most of our files start at 00 UTC, this has not been an issue, but for the wrong reasons. If it defaults to 00 UTC, we haven’t noticed that it’s not actually reading it in as it’s supposed to. If we were to put a + in front of the timezone in the units string, it would work, but that would require updating all ARM’s files.
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
@mgrover1 let's talk about this sometime! I'll try and get an example data file to you in the upcoming days but I'm not sure if this is something we could fix in pandas. We are working around this in ACT but I think it adds complication in the io/armfiles.py module. |
Beta Was this translation helpful? Give feedback.
-
@AdamTheisen do you have a sample file we could use to check this out? |
Beta Was this translation helpful? Give feedback.
-
Taking a look at this today! |
Beta Was this translation helpful? Give feedback.
-
So it looks like it's an issue that ds = xr.open_dataset('houkasacrcfrM1.a1.20210922.150006.nc',
use_cftime=True)
print(ds)
<xarray.DataArray 'time' (time: 64)>
array([cftime.DatetimeGregorian(2021, 9, 22, 15, 0, 6, 471754, has_year_zero=False),
cftime.DatetimeGregorian(2021, 9, 22, 15, 0, 8, 445242, has_year_zero=False),
cftime.DatetimeGregorian(2021, 9, 22, 15, 0, 10, 418669, has_year_zero=False),
cftime.DatetimeGregorian(2021, 9, 22, 15, 0, 12, 392112, has_year_zero=False), We can convert this to ds['time'] = ds.indexes['time'].to_datetimeindex()
print(ds.time)
<xarray.DataArray 'time' (time: 64)>
array(['2021-09-22T15:00:06.471754000', '2021-09-22T15:00:08.445242000',
'2021-09-22T15:00:10.418669000', '2021-09-22T15:00:12.392112000',
'2021-09-22T15:00:14.365539000', '2021-09-22T15:00:16.338989000',
'2021-09-22T15:00:18.312504000', '2021-09-22T15:00:20.285906000', This issue is solved within the |
Beta Was this translation helpful? Give feedback.
So it looks like it's an issue that
cftime
can solve.. for example:We can convert this to
np.datetime
by using: