-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible metadata in .eln files #58
Comments
In eLab, what you suggest would probably work well with the "Extra fields" feature: https://doc.elabftw.net/metadata.html#example-with-number-and-units. And
Flatten/normalize it.
No, we should put as much as we can in the metadata.yml file, exactly as you're suggesting. |
Unfortunately, I could not attend to the latest video call, so I may not understand you correctly. But isn’t the actual problem for the receiving ELN to understand the semantics of those fields? |
Prior to the meeting it had been suggested that we should pick an experiment that can be represented by an .eln file and imported/exported without significant loss of information. It should be similar when exported by all ELNs, which also has advantages such as making a comparison of the generated .eln files easier. We did not discuss much more than clarifying that this is a future goal. Storing flexible metadata about an experiment is a step in that direction.
This may be splitting hairs, but I do not think the receiving ELN has to understand the semantics, rather the users of the receiving ELN have to understand them. This can be achieved by utilizing the propertyID property, which can make use of whatever ontology exists in the specific field and whatever names or labels are associated with a specific value. In addition, the measurementTechnique and measurementMethod properties might help give some context, where applicable. |
I you don’t care about semantics, I don’t see the problem in just displaying
on the receiving side. |
I think the user experience is improved quite a bit if the ELN can display properties with identifiers the users can understand and values of a type the ELN knows how to display, rather than requiring that users read through various files in the hope of finding what they're looking for in a syntax and structure they know how to understand. |
Ah, now I see the problem … thanks for the explanation! |
(Note: I tried to post this earlier, but the comment seems to have gone lost. Apologies if it ends up as a duplicate) Here is the current ELN example with added flexible metadata: sampledb_export_with_flexible_metadata.zip (renamed to zip to so I can upload it in a GitHub comment) I've implemented the export fairly directly, here are a few examples for data types exported as Text{
"value": "OMBE-1",
"propertyID": "name",
"name": "Sample Name"
} Text is directly supported for value, so this is straightforward and can also serve as the fallback datatype as long as the data has a text representation. Boolean{
"value": false,
"propertyID": "checkbox",
"name": "Checkbox"
} This is also directly supported and fairly simple to implement. Quantity {
"value": 5.0,
"unitText": "\u00c5",
"propertyID": "multilayer.0.films.0.thickness",
"name": "Multilayers \u2192 0 \u2192 Films \u2192 0 \u2192 Film Thickness",
"unitCode": "A11"
} While this is directly supported, I found UN CEFACT codes to be a bit annoying to work with, but for those cases where a unit code does exist, they seem like they could genuinely avoid confusion between unit notations. This example also shows a fairly deeply nested property. Datetime{
"value": "2017-02-24 11:56:00",
"propertyID": "created",
"name": "Creation Datetime"
} This is just the date and time in UTC as it is used in SampleDB, though it might make sense to use ISO 8601 notation with time zone offset and possibly microsecond precision, just in case either of those are needed? In that case, this would be: {
"value": "2017-02-24T11:56:00.000000+00:00",
"propertyID": "created",
"name": "Creation Datetime"
} There is no clear indication that this is a datetime instead of a text beyond the format used, so a suggestion for how to clearly denote this to be a datetime would be welcome. Then again, this is quite the unlikely format to be fulfilled by accident, so a regular expression (or just an attempt to parse it as a date) should suffice. Object reference{
"value": "./objects/1",
"propertyID": "sample",
"name": "Sample"
} This uses the .eln internal {
"value": "http://localhost:5000/objects/1",
"propertyID": "sample",
"name": "Sample"
} |
I think those examples above all lack |
How do we want to flatten and afterwards json? Do we want to use the right-arrow that sampleDB is using (see that example below) or do we want to use the '/' which I find more common? I paste here the example of sampleDB: so you don't have to search |
For that, we also have to differentiate between |
Generally, I don't like double properties: storing propertyID and name which are almost identical (except of the capitalization and the separation symbol). Inconsistencies between both entries might then lead to strange behavior for the user. Uniqueness is important to me. |
I think that is how it is intended, |
I started implementing metadata in the ro-crate file. I added an EDITI'll probably remove that, and instead add This introduces the concept of namespaced custom properties. Other ELN must ignore them. And they should be avoided as much as possible. EDIT 2ok so instead I'll use Currently (WIP) this is what it looks like for this input form: "variableMeasured": [
{
"propertyID": "elabftw_metadata",
"description": "eLabFTW metadata JSON as string",
"value": "{\"extra_fields\": {\"multi select\": {\"type\": \"select\", \"value\": \"Paris\", \"options\": [\"Paris\", \"Londres\", \"Tokyo\", \"Madrid\"], \"position\": 1, \"allow_multi_values\": true}, \"with comment\": {\"type\": \"text\", \"value\": \"yep\", \"position\": 0, \"readonly\": true, \"required\": true, \"description\": \"this is the description\", \"blank_value_on_duplicate\": true}, \"num with unit\": {\"type\": \"number\", \"unit\": \"unit 2\", \"units\": [\"unit 1\", \"unit 2\", \"unit 3\"], \"value\": \"23\", \"position\": 2, \"description\": \"yep\"}, \"a dropdown menu\": {\"type\": \"select\", \"value\": \"choice 2\", \"options\": [\"choice 1\", \"choice 2\", \"choice 3\"], \"position\": 3, \"required\": true, \"description\": \"this one does not allow multiple selection\", \"blank_value_on_duplicate\": true}, \"a straightforward text input\": {\"type\": \"text\", \"value\": \"it contains a text value\", \"position\": 4, \"description\": \"this is the default input\"}}}"
},
{
"propertyID": "multi select",
"value": "Paris",
"description": null,
"unitText": null,
"valueReference": "select"
},
{
"propertyID": "with comment",
"value": "yep",
"description": "this is the description",
"unitText": null,
"valueReference": "text"
},
{
"propertyID": "num with unit",
"value": "23",
"description": "yep",
"unitText": "unit 2",
"valueReference": "number"
},
{
"propertyID": "a dropdown menu",
"value": "choice 2",
"description": "this one does not allow multiple selection",
"unitText": null,
"valueReference": "select"
},
{
"propertyID": "a straightforward text input",
"value": "it contains a text value",
"description": "this is the default input",
"unitText": null,
"valueReference": "text"
}
]
}, |
Here is what it looks like currently: "variableMeasured": [
{
"propertyID": "elabftw_metadata",
"description": "eLabFTW metadata JSON as string",
"value": "{\"elabftw\": {\"display_m...[skipped for brevity]..."
},
{
"propertyID": "Number",
"valueReference": "number",
"value": "",
"description": "no units"
},
{
"propertyID": "Type URL",
"valueReference": "url",
"value": "https://www.elabftw.net",
"description": "a link (readonly)"
},
{
"propertyID": "Just time",
"valueReference": "time",
"value": "17:00",
"description": "tea time"
},
{
"propertyID": "Some date",
"valueReference": "date",
"value": "2024-07-14",
"description": "is a date"
},
{
"propertyID": "Type user",
"valueReference": "users",
"value": 1,
"description": "this is a link to a user"
},
{
"propertyID": "A checkbox",
"valueReference": "checkbox",
"value": "on",
"description": "is checked"
},
{
"propertyID": "Email input",
"valueReference": "email",
"value": "[email protected]",
"description": "type email"
},
{
"propertyID": "Date and time",
"valueReference": "datetime-local",
"value": "2024-07-14T13:37",
"description": "datetime description"
},
{
"propertyID": "Radio buttons",
"valueReference": "radio",
"value": "Oui",
"description": "radio description"
},
{
"propertyID": "Type resource",
"valueReference": "items",
"value": 208,
"description": "This is a link to a resource"
},
{
"propertyID": "A dropdown menu",
"valueReference": "select",
"value": "Choice 1",
"description": "Single select"
},
{
"propertyID": "Text input name",
"valueReference": "text",
"value": "some text",
"description": "type text + all attributes"
},
{
"propertyID": "Type experiment",
"valueReference": "experiments",
"value": 373,
"description": "This is a link to an experiment"
},
{
"propertyID": "Number with units",
"valueReference": "number",
"value": "",
"description": "this one has units",
"unitText": "mM"
},
{
"propertyID": "Unchecked checkbox",
"valueReference": "checkbox",
"value": "",
"description": "this one is not checked"
},
{
"propertyID": "Multi dropdown menu",
"valueReference": "select",
"value": "Option 1",
"description": "Allows multiple selection"
}
]
}
]
} edit: realizing now that dropdown menu lose their other options... |
Here is an example of deeply nested metadata from Kadi4Mat. In Kadi4Mat, the metadata can be organized using nested types (along with primitive data-types). The following nested value types are available: Dictionary: A nested value that combines multiple metadata entries under a single key. In the example below, List: A nested value similar to dictionaries, but without keys for the values. In the example below, [
{
"@type": "PropertyValue",
"additionalType": "str",
"description": "Name of the instrument",
"identifier": "https://schema.org/name",
"propertyID": "Instrument.name",
"value": "SEM"
},
{
"@type": "PropertyValue",
"additionalType": "str",
"propertyID": "Instrument.manufacturer.manufacturerName",
"value": null
},
{
"@type": "PropertyValue",
"additionalType": "float",
"propertyID": "Instrument.Settings.beam spot size",
"value": 1.2,
"unitText":"mm"
},
{
"@type": "PropertyValue",
"additionalType": "str",
"propertyID": "Instrument.Detector.0",
"value": "EDT",
},
{
"@type": "PropertyValue",
"additionalType": "str",
"propertyID": "Instrument.Detector.1",
"value": "CDEM",
}
]
|
The receiving ELN will display to the user “Instrument.Settings.beam spot size: 1.2mm”? |
If the receiving ELN doesn't support nested entries, the propertyID or name in the property values should be splitted at the preferred separator, for example like in the SampleDB metadata. |
If an ELN finds the graph triple
it can also display that nicely to users. I see the need for PropertyValue if you are forced to use only schema.org, but in RO-Crates, arbitrary vocabularies are allowed next to schema.org. |
@jmanideep can you provide a .eln with such metadata so we can test that easily? |
Here is the ELN file instrument-used-in-experiment.zip from Kadi4Mat. |
@jmanideep don't you have sha256 sum for attached files? Also, there is no Author node, is this expected? |
I accidentally filtered out the author node during export, but in general, it will be there. Here is the update file And regarding sha256 checksum, we don't include it currently, as is the case in our regular example. |
For future reference: During yesterday's meeting, we've agreed on using |
@FlorianRhiem do you wish to take a stab at adding a section in the SPECIFICATION about how we handle arbitrary metadata in a .eln? Mainly the point about the |
Currently, the .eln format does not have a unified way of exporting flexible metadata. Instead, most ELNs export a data structure specific to that ELN in JSON format which contains various information about a dataset, including some flexible metadata. As they are (mostly) representations of internal models, they vary quite a bit.
Motivation
While it is already useful to be able to reference samples, measurements and other objects from other ELNs with some generic metadata such as the creation and modification times and the author, it would be even better if we could exchange flexible metadata about these. In the last meeting, we briefly discussed the goal of a "gold standard" experiment that can be represented as an .eln file, imported and exported by the various ELNs. For this, we should be able to exchange information such as instrument or process parameters. We cannot expect to strictly define these, instead they should map a (textual) identifier to data of some type.
Ideas / Suggestions
For mapping identifiers to values, the PropertyValue should be useful, as it can map its
propertyID
to itsvalue
, which can be a boolean, text, a number or the genericStructuredValue
type, and also supports units and a human-readable version as a fallback. So, if we had to store a temperature with the identifiertarget_temperature
, it could be represented as:while a boolean instrument setting could be represented like this:
If we would provide an array of such property values, we could support a flat mapping of identifiers to values. Such an array is part of the
Dataset
type we use for datasets in thero-crate-metadata.json
in the propertyvariableMeasured
, however this use case goes beyond the "variables that are measured in some dataset". So we could either use that property and "stretch" its definition by a fair bit, use another existing property, or branch off there and define a custom property.As
PropertyValue
objects can contain a value ofStructuredValue
type, of whichPropertyValue
is a sub-type off, it might also be possible to implement nested data structures like this. Alternatively, the structure could be represented in the identifier.What are your thoughts on storing flexible metadata in
PropertyValue
objects? Which solution for attaching these to the datasets do you prefer? How should we deal with nested metadata? Would you prefer to store flexible metadata outside the ro-crate-metadata.json entirely or in a custom format instead?The text was updated successfully, but these errors were encountered: