Cluster CDFs from CSA sometimes have variables incorrectly marked non-record-variant #939

jameswilburlewis · 2024-07-18T02:03:19Z

While troubleshooting a crash loading Cluster data, I discovered a problem with the CDFs themselves: if the CDF contains only a single sample for an array-valued variable, it can lose its leading [time] dimension and is marked as "non-record-variant".

In this example:

    def test_load_csa_mom_data(self):
        del_data('*')
        mom_data = pyspedas.cluster.load_csa(probes=['C1', 'C2', 'C3', 'C4'],
                                             trange=['2003-08-17/16:40', '2003-08-17/16:45'],
                                             datatypes=['CP_CIS-HIA_ONBOARD_MOMENTS'], time_clip=True)
        self.assertTrue('density__C1_CP_CIS_HIA_ONBOARD_MOMENTS' in mom_data)
        self.assertTrue(data_exists('density__C1_CP_CIS_HIA_ONBOARD_MOMENTS'))

we load moments data for all four probes. C1 has multiple samples falling into the selected time range, but C3 only has a single timestamp. C4 seems to have nothing available. Here are some log messages from loading the C3 data, with multiple errors:

17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable sensitivity__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable cis_mode__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable density__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: velocity_isr2__C3_CP_CIS_HIA_ONBOARD_MOMENTS: lengths of x (1) and y (3) do not match! Mislabeled NRV variable?
17-Jul-24 18:47:43: Exception of type <class 'ValueError'> raised during store_data call for variable velocity_isr2__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: conflicting sizes for dimension 'time': length 3 on the data but length 1 on coordinate 'time'
17-Jul-24 18:47:43: velocity_gse__C3_CP_CIS_HIA_ONBOARD_MOMENTS: lengths of x (1) and y (3) do not match! Mislabeled NRV variable?
17-Jul-24 18:47:43: Exception of type <class 'ValueError'> raised during store_data call for variable velocity_gse__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: conflicting sizes for dimension 'time': length 3 on the data but length 1 on coordinate 'time'
17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable temperature__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable temp_par__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable temp_perp__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: Exception of type <class 'TypeError'> raised during store_data call for variable pressure__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: len() of unsized object
17-Jul-24 18:47:43: pressure_tensor__C3_CP_CIS_HIA_ONBOARD_MOMENTS: lengths of x (1) and y (3) do not match! Mislabeled NRV variable?
17-Jul-24 18:47:43: Exception of type <class 'ValueError'> raised during store_data call for variable pressure_tensor__C3_CP_CIS_HIA_ONBOARD_MOMENTS
17-Jul-24 18:47:43: Exception message: conflicting sizes for dimension 'time': length 3 on the data but length 1 on coordinate 'time'

The scalar-valued variables seem to have degenerated from 1-element arrays, as they should be, to bare scalars.
The array-valued variables (velocity and pressure_tensor) have also lost their leading dimension. Using the cdfeditor command-line utility directly on the CDFs returned by the CSA query (so, bypassing any possible issues PySPEDAS, PyTplot, or cdflib) shows that the C3 variables are marked non-record-variant, which is incorrect.

The same variables for the C1 CDF, with multiple samples, load correctly and are correctly marked as record-variant.

Other than adding some defensive programming to cdf_to_tplot to avoid outright crashes, there's not going to be a clean way to handle this. We can't just add the time dimension back, because that would break applications that rely on correctly marked NRV variables having the expected number of dimensions.

The text was updated successfully, but these errors were encountered:

jameswilburlewis · 2024-07-18T16:46:57Z

For the Cluster pressure tensor variables and similar situations, we might be able to get away with adding the extra dimension by checking whether the variable has too many DEPEND_N attributes for its actual number of dimensions.
Normally the pressure tensor would have dimensions of time x 3 x 3, with DEPEND_0, DEPEND_1, and DEPEND_2. If we lose the time dimension, we 3x3 instead of 1x3x3, so DEPEND_2 is dangling. If we see that DEPEND_0 has a single timestamp, and if there's an extra DEPEND_N, it might be safe to add the extra dimension back.

Some MMS support variables (e.g. FEEPS energy upper/lower/centoid arrays)have a DEPEND_0 but don't actually have a time dimension. Those FEEPS variables are 1-D, I think, and they are marked non-record-variant despite having a DEPEND_0 attribute. Usually there's more than one timestamp, but it's possible this case might collide with the Cluster case, with opposite actions required for each.

jameswilburlewis · 2024-07-19T02:35:36Z

For the time being, I've added code to cdf_to_tplot to restore the probably-missing time dimension for the case when there is only a single timestamp.

jameswilburlewis added bug Something isn't working pytplot Issues involving the pytplot package CDF Cluster labels Jul 18, 2024

jameswilburlewis self-assigned this Jul 18, 2024

jameswilburlewis added the Data Servers Issues with remote data servers (SPDF, MMS, MAVEN, JAXA, etc) label Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster CDFs from CSA sometimes have variables incorrectly marked non-record-variant #939

Cluster CDFs from CSA sometimes have variables incorrectly marked non-record-variant #939

jameswilburlewis commented Jul 18, 2024 •

edited

Loading

jameswilburlewis commented Jul 18, 2024 •

edited

Loading

jameswilburlewis commented Jul 19, 2024

Cluster CDFs from CSA sometimes have variables incorrectly marked non-record-variant #939

Cluster CDFs from CSA sometimes have variables incorrectly marked non-record-variant #939

Comments

jameswilburlewis commented Jul 18, 2024 • edited Loading

jameswilburlewis commented Jul 18, 2024 • edited Loading

jameswilburlewis commented Jul 19, 2024

jameswilburlewis commented Jul 18, 2024 •

edited

Loading

jameswilburlewis commented Jul 18, 2024 •

edited

Loading