-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression for SigMF Archives #68
Comments
Not having a compression format in the spec doesn't stop people from using compression, though? Using tar (which we've used as our archive format) as an example, tar doesn't itself specify any compression, but it's almost always compressed, with the compression format tacked on after .tar, so Could we recommend a similar syntax, where we use Pros:
Cons:
|
This is a side issue to the question of doing it at all but note that extensions like |
I'm actually using the SigMF archive format, and I can tell you that 100% of the time I just change the extension to .tar when I download the file so that the built-in archive utility recognizes it and opens it. That's fine, and I get and agree with the point of using the You can always open the files directly in the sigmf reader that supports compression to avoid it being picked up by a decompression utility first. |
We can also, very easily, add tools for common OS's to identify .sigmf as tar, and .sigmf.gz as tar.gz etc. I like @djanderson's approach -- it's common practice, and solves the problem making it not-our-problem. The issue that people could start using non-free compressors is a thing, but when does that actually happen these days? WinRAR anyone? I haven't used ARJ since the era of distributing video games on 3.5" floppies 👴 |
@mbr0wn I don't think there is usually the ability to assign a meaning to an extension with another dot in the middle, as |
It doesn't sound like anyone has any arguments against allowing for compression, so great! We'll move ahead with that. Now, to figure out how to spec it, hah =) @kpreid has an interesting point regarding enabling transparent reading (and thus decompression) if we have a unique extension. Using OS built-ins can be really convenient, too, as @djanderson. It seems like a key question is: do we expect most people to be interacting with SigMF Archives using archive utilities or SigMF applications? Let's say we have a SigMF reader application that is popular and widespread - how many people will still want to just download the archive, peek into it, and open the metadata in To be honest, I don't think I have a strong opinion one way or the other right now. I'll spin on it a bit, and am really interested to hear everyone else's opinion, if you feel strongly about it. |
Just want to confirm that some of my real-world raw I-Q data files do compress very well (50% or more). int16 interleaved; could be more effective if the data is in ieee754. This feature will help transferring data files over the Internet a lot. One alternative to compressing the whole archive file could be compressing the raw I-Q data file only, and offer a SigMF label for this. (compression_type: raw, gz, etc.) |
@cityscapesc - Thanks for the input and data point! You raise an interesting idea about just compressing the datafile. The primary issue I can think of, there, is that then we are almost forcing Reader / Writer applications to have de-compression libs built into them. If we do it at the archive level, it could be a more natural part of the workflow to do it prior to application loading. Either route carries some risks, though, in terms of application complexity. I think @cityscapesc's data point is a good one, though, and backs up the feedback we got at GRCon: compression would be useful. So, going back to the question of extension: I just pulled up the documentation on MIME Extensions, and as it turns out, the specification is that the longest pattern has the heaviest weight. From https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html#idm140625828677088 :
My read of this is that we could safely do |
We've seen in the past couple of years that most people are compressing SigMF recordings - even when the gains from compressing the raw samples are low (as expected), in recordings with siginificant annotations, they can be meaningful. In #99, I sort of sloppily added support for gzip. Realistically, supporting gzip and bzip2 are probably safe given their ubiquity. It's entirely possible people will want to use other forms of compression, but making those canonical would be problematic on some systems (thinking of 7zip, for example). My inclination is to allow compression of archives using gzip and bzip2. I'm leaving this issue open for disagreeing opinions for a bit longer, in case anyone wants to comment. |
So I have an alternate proposal. From the conversation I've picked up on the following desires:
I think that the experience of using a library to work with a compressed tar would be fairly painful because it requires you to decompress the whole archive in order to read the metadata and provides 0 random-access to the data (so you have to decompress and untar the whole thing). This was also pointed out by @citscapesc. I propose
This would allow a lightweight reading of the metadata, then randomly seeking to samples of interest without decompressing the entire archive. This means it's now possible to deal with very large recordings in a compressed manner without a very large memory requirement and no need to make a duplicate decompressed copy on disk. The expense is that of CPU power to decompress on the fly. Lots of attention and care to selecting the right compression format is required, but I'm curious if this sounds like a reasonable approach to the folks with the use case for it. |
@n-west - I like your suggestion a lot. Reading that thread, it actually sounds like both gzip and bzip2 are also capable of PR-access of the compressed blocks, but it's not clear to me if those are natively available in the primary packages that are distributed. I would like to have the feature described by Nathan 👆👆 in v1.0.0. Marking as such |
I don't see this being ready for v1.0... moving to 2.x (can pull in to 1.1 or something sooner if we want to, this is a new feature and should not break anything) |
As a possibly related bit of info, there's IQZIP by the Libre Space Foundation which implements a CCSDS lossless compression standard. They did some interoperability work with SigMF and it looks like that was removed. |
Item from GRCon discussions:
So, we had previously decided in the discussion about archive formats #15 to dis-allow compression, with the reasoning that compressing IQ recordings rarely gives you anything and complicates the reader / writer applications. But, a couple of folks from the National Labs pointed out that you sometimes have recordings of mostly zero values in some systems. They are currently using HDF5 and considering moving over to SigMF in a number of their programs, but would like to see a compression capability, which HDF5 allows for.
I think this argument is reasonable, and it does make sense. There will definitely be recordings where the Vpp is not changing so much as to make compression useless.
I'd like to solicit thoughts and opinions on both this topic, generally, as well as what forms of compression might make the most sense for SigMF archives.
The text was updated successfully, but these errors were encountered: