Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider representing the dataset format in a more human readable format #162

Open
jsallay opened this issue Sep 22, 2021 · 3 comments
Open

Comments

@jsallay
Copy link

jsallay commented Sep 22, 2021

One of the nice things about SigMF is that it stores the signal metadata in a human readable format. I feel like the dataset format is an exception. For someone who is used to the format, I'm sure that it is simple to look at a string like "ru16_be" and parse the meaning. But for a novice user, they really need a decoder ring. In my opinion, this adds unnecessary friction to adoption of the format.

I would propose that the dataset format be expressed in plain English. I don't have strong opinions on what the specifics of it are, but I personally like something like:
'dataset-format' = {
'space': 'complex',
'type': 'signed',
'bit-width': 16
'endian': 'little'
}

This does take up more space, in the file, but I think it would be much clearer for users. Using a format like this will also make it easier in the future if new types are added. For example, if we wanted to add 16-bit floating point support ( a format commonly used in GPU processing), there are multiple standards, IEEE-754 and bfloat16. This would be difficult to specify with a single character (as is done now), but very easy to express in plain English using the "type" field above or something similar.

@Teque5
Copy link
Collaborator

Teque5 commented Sep 23, 2021

This will also enable the float9 format @jacobagilbert really wants.

@bhilburn
Copy link
Contributor

I think this is a fair point. This suggestion would also make it possible to quickly check individual aspects of the datatype without parsing the entire string (although I'm not sure if this is actually useful).

I'm on-board with doing this, but am not certain about timing. My gut, here, is that we mark this for post-v1.x. We cannot remove the existing approach without breaking backwards compat, which means this would have to get added in parallel, at which point we've made it possible to create two critical fields that disagree with each other unless we mark the existing method as deprecated, which is a huge shift for existing applications. Short of it is this is too messy, IMO.

I'm going to mark this for v2.x - open to disagreements / opinions otherwise.

@bhilburn bhilburn added this to the Future (v2.x) milestone Sep 30, 2021
@jsallay
Copy link
Author

jsallay commented Sep 30, 2021

The eventual plan would be to deprecate the current way of expressing type, which means that both formats will have to be supported for a time. I don't think there is a need to have this in place for v1.0, but I wouldn't want to wait until 2.0 to use it at all. I would hope that it could be added in sometime in the 1.0 series as an option and then in v2.0 we deprecate the current datatype format.

When it is added, I would requre a metadata file to use either the old way or the new way, but never allow both (which is easy to do in json schema). So there would never be a case of datatype disagreement. (If you meant some else by "two critical fields that disagree with each other" let me know)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants