Consider representing the dataset format in a more human readable format #162

jsallay · 2021-09-22T21:45:54Z

One of the nice things about SigMF is that it stores the signal metadata in a human readable format. I feel like the dataset format is an exception. For someone who is used to the format, I'm sure that it is simple to look at a string like "ru16_be" and parse the meaning. But for a novice user, they really need a decoder ring. In my opinion, this adds unnecessary friction to adoption of the format.

I would propose that the dataset format be expressed in plain English. I don't have strong opinions on what the specifics of it are, but I personally like something like:
'dataset-format' = {
'space': 'complex',
'type': 'signed',
'bit-width': 16
'endian': 'little'
}

This does take up more space, in the file, but I think it would be much clearer for users. Using a format like this will also make it easier in the future if new types are added. For example, if we wanted to add 16-bit floating point support ( a format commonly used in GPU processing), there are multiple standards, IEEE-754 and bfloat16. This would be difficult to specify with a single character (as is done now), but very easy to express in plain English using the "type" field above or something similar.

Teque5 · 2021-09-23T18:46:51Z

This will also enable the float9 format @jacobagilbert really wants.

bhilburn · 2021-09-30T01:38:59Z

I think this is a fair point. This suggestion would also make it possible to quickly check individual aspects of the datatype without parsing the entire string (although I'm not sure if this is actually useful).

I'm on-board with doing this, but am not certain about timing. My gut, here, is that we mark this for post-v1.x. We cannot remove the existing approach without breaking backwards compat, which means this would have to get added in parallel, at which point we've made it possible to create two critical fields that disagree with each other unless we mark the existing method as deprecated, which is a huge shift for existing applications. Short of it is this is too messy, IMO.

I'm going to mark this for v2.x - open to disagreements / opinions otherwise.

jsallay · 2021-09-30T22:14:43Z

The eventual plan would be to deprecate the current way of expressing type, which means that both formats will have to be supported for a time. I don't think there is a need to have this in place for v1.0, but I wouldn't want to wait until 2.0 to use it at all. I would hope that it could be added in sometime in the 1.0 series as an option and then in v2.0 we deprecate the current datatype format.

When it is added, I would requre a metadata file to use either the old way or the new way, but never allow both (which is easy to do in json schema). So there would never be a case of datatype disagreement. (If you meant some else by "two critical fields that disagree with each other" let me know)

bhilburn added this to the Future (v2.x) milestone Sep 30, 2021

bhilburn added the enhancement label Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider representing the dataset format in a more human readable format #162

Consider representing the dataset format in a more human readable format #162

jsallay commented Sep 22, 2021

Teque5 commented Sep 23, 2021

bhilburn commented Sep 30, 2021

jsallay commented Sep 30, 2021

Consider representing the dataset format in a more human readable format #162

Consider representing the dataset format in a more human readable format #162

Comments

jsallay commented Sep 22, 2021

Teque5 commented Sep 23, 2021

bhilburn commented Sep 30, 2021

jsallay commented Sep 30, 2021