-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema vs Model Distinction #70
Comments
I claim that as a general principle of software engineering, one should not call an application noncompliant just because:
Rather, compliance of an application should be such conditions as:
|
@bhilburn, did you get any more insight into what specifically makes it difficult with databases? I think the fact that we split metadata from data, break data into capture segments, and provide unique keys in the form of I'm honestly not sure what we could do to make SigMF easier to drop into a database, and as @kpreid said, there's nothing about the spec that stops or even discourages them from creating an application that does so. |
The biggest proponent for this, actually, was @namccart. He was explaining that one of the reasons that he really likes VITA49 for this particular application is that it provides pre/easily 'chunkable' data. So, based on my understanding from @namccart, for example, if you load a SigMF recording into a database and search over Nick, can you comment? |
I'm also not convinced this is really a SigMF problem. I see how it makes writing SQL <-> SigMF converters a bit more complicated, but they also solve really, really different problems. |
I think Ben captured my issue pretty well. Given what the Darpa folk want
to do with SigMF, I think you really want to consider how SigMF plays
nicely with the overall problem of data retrieval from big RF data
archives. Hearing Tom talk about his gnuradio-SQL idea (which I also want
and think is inevitable), everything starts with being able to retrieve
arbitrary I/Q based on reasonable query strategies. I'm sure there are
better solutions to this problem than I can imagine. I think hdf5 has an
entirely different solution to searching the archive than trying to chunk
the data into a database... what I don't know is whether hdf5 plays nicely
with hdfs or other distributed setups. I know very little about hdf5
except that it's intriguing.
In any event, if you accept that this is in fact a problem SigMF should
address, I think it makes sense to move in a direction that supports one or
more existing ways of archiving and searching lots of data (time-series
data or otherwise). For me, those candidates are databases (Couchdb,
mongodb, postgres), database-like things (elasticsearch), and hdf5...
Cheers,
Nick M.
…On Tue, Oct 3, 2017 at 3:44 PM Martin Braun ***@***.***> wrote:
I'm also not convinced this is really a SigMF problem. I see how it makes
writing SQL <-> SigMF converters a bit more complicated, but they also
solve really, really different problems.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#70 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEZpNZyNJxU7-oyjWPZc3j4iDs2aujw7ks5soo6FgaJpZM4PeC3U>
.
|
Okay, so, SigMF already provides a solution to this, but we should discuss whether there are changes that would improve it: So, what @namccart cares about, per my comment above, is the ability to load smaller "chunks" of data than the entire dataset, which makes it much easier to work with databases. SigMF allows for this using the So, the question here, then, is "What, if anything, could we do to make this better?" Is there some change we should recommend? If we just provide a tool that cleanly splits your dataset into multiple files, of a parametizable size, does that solve the issue? |
Spun on this a bit. @kpreid had a really good point, early on, that we shouldn't call something Per my previous comment, what @namccart wants to do is already pretty doable with SigMF. We could make it easier by providing a tool, for example, that showed you how to chunk the data based on metadata segment, but there really isn't anything difficult, here, in my opinion. So, I think the final question that should be debated is whether or not this is a format that we want to be able to distribute So, before we close this issue out with either a |
I'm new here, so if these comments are missing the point, I apologize. One feeling I had as I read the spec (as an experienced spec reader and writer) is that the current draft spec conflates semantic content of the metadata with the transfer encoding/format of the data. In plain English: it seems to me the definition of "what are the allowed tags and values in SigMF metadata" can (and should) be separate from HOW the tag value pairs are encoded. I'm all for SigMF metadata including "datatype" and "sample_rate" and "version", and so forth; consider this the "schema" of "SigMF metadata". But I think the spec would be strengthened by separating out the fact that "it must be a JSON file". I feel the SigMF spec SHOULD say: when SigFM metadata is written to a file, it then must be a JSON structured UTF-8 file, with a single object per file, and use the following extension. If ALSO a standard way for "writing a SigMF object to a SQL database" is needed... then that should be specified as an alternate way to store an SigMF metadata (and maybe the dataset, too). Should one write the JSON version of the metadata as text-blob to a single VARCHAR field? Or should each field of the metadata get its own SQL field? Personally, I don't care; I find both of these reasonable in certain cases. Should the SigMF spec weigh-in on the "correct"/standard way to do this? Only if the community thinks it is helpful. And then what if I want to store SigMF data -- both metadata and the dataset -- in a document database such as MongoDB? Do we need to define a "standard" -- that is "compliant" -- way to do that? MY MAIN POINT is that because the verbiage of the spec conflates "SigMF metadata is a JSON object with this format", I think it leads to the ambiguity that is being discussed in this thread. My advice: separate sections for the semantic part of "what is SigMF metadata", and then requirements for how they should be serialized into a file (JSON), and -- if desired by the community -- recommendations for "best practices" when stored in relational records, or a document database, or -- as needed -- in other portable files/containers/transfer mechanisms. |
I agree that distinguishing consistently between schema/model and encoding would be useful, but I think that "separate sections" is a bad idea unless those sections are interleaved: the value of making the distinction clear is less than the value of making it obvious how to implement SigMF's intended primary use — an interchange file format. |
@kpreid: It is a pretty common technique in standards documents to separate the schema from encodings. In fact, in many standards documents, ALL the encodings show up as examples/supplementary information in appendices. For SigMF to really catch on, I think it need to address is motivating use case of FILE interchange, but ought to give SOME consideration for logical next steps, such as storing both the dataset and metadata in either relational and document databases. (After all, a filesystem and a tarfile are simply ONE instance of a "document database" or "document datastore".) Actual file storage might be many users primary use case... but for me, it probably won't be. Minor adjustments to the contents and the format of the spec might ensure my use case is well covered, too. This will be a boon to the spec if we can achieve it without impeding the file use case... and I feel we can. All that said, I have no trouble if we inline/interleave JSON-file examples in the text, provided there is 1) a clear editorial distinction between "schema requirements" and "JSON-file encoding requirements", 2) there is some other place in the document that addresses the needs of other encodings (possibly appendices). |
So, it's taken me far too long to address this. @dharasty - I think you make really excellent points, and I appreciate you providing your insight, here. I would like to make the change you suggest (i.e., distinguishing between schema and model) as part of the I'm interested to know your thoughts on the best way to go about doing this. Is there any chance you would be up for putting together a PR that demonstrates an approach you thinks works well? |
Some minor changes that clearly distinguish between the schema and file encoding will be made in the v0.0.2 release per the discussion above. |
I feel like this is an important conversation, but it should probably be pushed to v1.1+ so as not to delay the timely release of v1.0.0. |
@bhilburn do you agree with @jacobagilbert 's comment? I do. |
@bhilburn ping |
@bhilburn It is pretty exciting to this that this is the only issue still languishing in the "Not Started" bucket for the 1.0 release ... I wait with baited breath for progress :-D |
One of the comments we got at GRCon about SigMF is that it seemed to make working with datasets difficult.
Specifically, what this person wanted to be able to do was
SELECT
something in a database, parametrically, based on the metadata, and then have it return a chunk of samples. The obvious solution is chunking the SigMF data file by capture segment and then storing those chunks with the segments as keys - but this no longer represents a compliant recording per the standard. Possible? Yes. But not standard.Is this something we should address? I agree that it is a useful structure and I think a lot of users will want to use something like it. Even if we don't want to make this a compliance requirement, are there things we can do in the standard to make it easier to accomplish?
The text was updated successfully, but these errors were encountered: