-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata for many fragments of a recording #271
Comments
Hi @PeterisP - I think I understand generally what you are looking to see here but I could use a little more info to best answer this. My initial thought is to steer you toward the Collection idea, but can you explain a little bit more about the need for a "Collection of Collections"? After chopping up multiple files and extracting the bursts, I suppose I would consider the initial concept of the base (pre-chopped) files some what irrelevant at that point and you could flatten those into "just one collection", maybe you can describe why that would be problematic if that is the case? Right now (as you identified) a SigMF Recording is the core component of SigMF data and is comprised of exactly one data and one metadata file, which was a very intentional decision. Changing that would represent a fairly large change in how SigMF is defined, so there would need to be a compelling reason, which I think is still lacking for #245. |
The initial concept of the base (pre-chopped) files is relevant to me because the expected analysis of the dataset requires treating the packets as part of a single conversation - linking a request fragment with the response to it, tracking e.g. the time offsets between packets, tracking the channel hopping pattern, performing payload decoding from many packets to a continuous stream; so there is a need for some metadata structure that links those segments together. The other issue is that of practicality of file sizes. The use case I have in mind is a dataset of many radio recordings of the same actions done with many different devices, as data for device behavior analysis and fingerprinting. Keeping every packet as a separate file is not optimal, as every recording may have 10000-100000 fragments(packets), and some filesystems have serious performance issues with datasets consisting of millions of separate files, so to me it seems desirable to store each recording as a SigMF Collection in a SigMF Archive. On the other hand, flattening all recordings in a single Collection would mean a single Archive which is impractical as the whole package can be very large (e.g. many terabytes) which is awkward to distribute and may have performance issues when seeking for specific packet files within the .tar. And keeping each recording in its own Collection/Archive implies a need for some index to summarize them (a file listing every recording, but not every single data packet in that recording) i.e. a "collection of collections". |
Ok, thanks for expanding. Having tens of thousands of files open (and dealing with the associated thrashing) is certainly something to avoid, so that makes a lot of sense. One other thing that comes to mind is to reduce each raw data file into just the segmented parts of interest concatenated together. You can make use of the This is an approach used by a lot of people and also happens to work nicely for decimated and dehopped FHSS data because each segment can have its own frequency; really the main limitation is that they be at the same sample rate (a very strong requirement for any single SigMF Recording). |
The Core spec supports a scenario where there are many capture segments in a recording data file and the full data file is available.
We would like to apply SigMF for a scenario where the dataset is provided as separate IQ files for each of the capture segments containing extracted transmission packets (separated both in time and frequencies i.e. channels) - and the full original recording is not provided, so that the non-packet time and frequency ranges are discarded to make the dataset size more manageable.
We would like to request some way to alter the
capture segment
Object definition to include a reference to a specific dataset file instead of the index into the global sample stream. Perhaps this overlaps with #245 .Alternatively, this might be solvable by treating this as a SigMF Collection (i.e. when extracting packets, convert a single SigMF Recording to many SigMF Recordings, one for each packet), however, in this case we would need a way to represent a Collection of Collections, as there would be many linked recordings each with many extracted packets.
The text was updated successfully, but these errors were encountered: