Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit metadata creation on JSON leaves #778

Open
thjaeckle opened this issue Aug 27, 2020 · 11 comments
Open

Implicit metadata creation on JSON leaves #778

thjaeckle opened this issue Aug 27, 2020 · 11 comments

Comments

@thjaeckle
Copy link
Member

thjaeckle commented Aug 27, 2020

Following up on Ditto being able to manage arbitrary metadata with #680 it could be useful in some use cases to automatically create/inject certain metadata on each modification of a digital twin.

The 3 obvious implicit metadata fields coming to my mind:

  • _revision: the thing _revision in which the JSON leaf (e.g. a feature property or an attribute) was modified for the last time
  • _modified: the modification date at which the JSON leaf (e.g. a feature property or an attribute) was modified for the last time
  • _created: the creation date at which the JSON leaf (e.g. a feature property or an attribute) was created

I suggest keeping the underscore prefix _ in order to "mark" that those are automatic / Ditto managed metadata fields.

An example thing JSON document could look like this:

{
  "thingId": "org.eclipse.ditto:thing-1",
  "policyId": "...",
  "features": {
    "lamp": {
      "properties": {
        "on": true,
        "color": {
          "r": 0,
          "g": 255,          
          "b": 255,
        }
      }
    }
  },
  "_created": "2020-06-01T10:00:00Z",
  "_modified": "2020-06-09T14:30:00Z",
  "_revision": 42,
  "_metadata": {
    "settings": {
      "auto-metadata": [
        "_revision",
        "_modified"
      ]
    },
    "policyId": {
      "_modified": "2020-06-09T14:00:00Z",
      "_revision": 1
    },
    "features": {
      "lamp": {
        "properties": {
          "on": {
            "_modified": "2020-06-09T14:30:00Z",
            "_revision": 42
          },
          "color": {
            "r": {
              "_modified": "2020-06-09T14:15:00Z",
              "_revision": 23
            },
            "g": {
              "_modified": "2020-06-09T14:15:00Z",
              "_revision": 23
            },
            "b": {
              "_modified": "2020-06-09T14:15:00Z",
              "_revision": 23
            }
          }
        }
      }
    }
  }
}
@jufickel-b
Copy link

How does Ditto know what fields to set implicitly? I assume that this should not be hard-coded but at least global configuration. Or is this configurable by the user somehow?

@thjaeckle
Copy link
Member Author

I also thought about this and see 2 options:

  • always set the built-in metadata
  • make it configurable on thing level, maybe as part of the "_metadata" object, e.g.:
"_metadata": {
  "settings": {
    "auto-metadata": [
      "_revision",
      "_modified"
    ]
  }
}

@w4tsn
Copy link
Contributor

w4tsn commented Sep 14, 2020

Does providing a metadata field which is also auto-injected by ditto (configurable or not) take precedence? E.g. if I want to have ditto set the modified field automatically if I don't provide one myself?

@thjaeckle
Copy link
Member Author

@w4tsn that's to discuss. It could be very useful when you e.g. want to provide a custom modified timestamp that your defined modified has priority above the "fallback" issued by Ditto.
For the revision however that would not make sense FMPOV - this is an "internal only" metadata.

@w4tsn
Copy link
Contributor

w4tsn commented Sep 14, 2020

Then following up on other discussion on this topic it might be better to have the internal _modified anyways and an additional issuedAt which is either set automatically by ditto as a fallback or via a header which then takes precedence. In the default case this might turn out to be redundant but in the end you might wanna know when a value was set in ditto and when it was actually issued by the device.

Ok, but then again you could also implement it in the client that if there is no issuedAt provided just fallback to _modified

@thjaeckle
Copy link
Member Author

Ok, but then again you could also implement it in the client that if there is no issuedAt provided just fallback to _modified

Exactly, I would prevent duplicating metadata :)

@jufickel-b
Copy link

jufickel-b commented Sep 15, 2020

"_metadata": {
  "settings": {
    "auto-metadata": [
      "_revision",
      "_modified"
    ]
  }
}

This could indeed be a way to go. Probably we also need the possibility to define default values for auto-metadata fields.

@thjaeckle
Copy link
Member Author

"_metadata": {
  "settings": {
    "auto-metadata": [
      "_revision",
      "_modified"
    ]
  }
}

This could indeed be a way to go. Probably we also need the possibility to define default values for auto-metadata fields.

Why would we? Defaults can only be static, can't they?
What would be the benefit to have an implicit Metadata with the same static default value for each Json leaf?

@w4tsn
Copy link
Contributor

w4tsn commented Sep 16, 2020

I'm not sure if there is a use-case that justifies deactivating built-in metadata. Maybe that's a unnecessary complexity right now.

I think static metadata defaults would only make sense if one can also specify for which paths those defaults apply. This would make it possible to load a vorto model as kind-of prototype into the metadata which would then automatically apply metadata to incoming data of this model / the provided auto-metadata definition. However for most parts this could also be written initially on thing creation. Anyways this is how I would image something like this:

  "_metadata": {
    "prototype": {
      "features": {
        "temperature": {
          "properties": {
            "value": {
              "measurementUnit": "°C"
            }
          }
        }
      }
    }
  }

Which produces this thing:

{
  "thingId": "org.eclipse.ditto:thing-1",
  "policyId": "...",
  "features": {
    "temperature": {
      "properties": {
        "value": true
      }
    }
  },
  "_created": "2020-06-01T10:00:00Z",
  "_modified": "2020-06-09T14:30:00Z",
  "_revision": 42,
  "_metadata": {
    "prototype": {
      "features": {
        "temperature": {
          "properties": {
            "value": {
              "measurementUnit": "°C"
            }
          }
        }
      }
    },
    "policyId": {
      "_modified": "2020-06-09T14:00:00Z",
      "_revision": 1
    },
    "features": {
      "temperature": {
        "properties": {
          "value": {
            "_modified": "2020-06-09T14:30:00Z",
            "_revision": 42,
            "measurementUnit": "°C"
          }
        }
      }
    }
  }
}

If one would change the model / auto-metadata of this kind then new incoming data would automatically cause a change of the attached metadata.

A use-case for this could be a status property which is a map of wireless device in range which is dynamically filled. I know which metadata the objects of this map would have, but they are not present on thing creation.

One problem I stumbled upon with current metadata in general: in the above example the color object is not a leaf and hence has no metadata assigned. In terms of _modified and _revision I can infer the data by going through each leaf and see which one is the "oldest". If we think about other metadata e.g. from vorto like the description, measurementUnit or editable flag, those can also be set for the color object hence the requirement for metadata not only for leafs.

Other interesting built-in:

  • _created - we have a status property containing a map of several wireless nodes which is dynamically filled with wireless nodes in range. It would be useful to see when an entry to this map was first created. In this case: how does metadata cope with arrays?

@thjaeckle
Copy link
Member Author

I'm not sure if there is a use-case that justifies deactivating built-in metadata. Maybe that's a unnecessary complexity right now.

Sure - simply in order to save on the additional data volume when someone does not need it at all.
Also, Ditto managed things do have a max. size in bytes (by default 100kB). The implicit metadata could probably half this max. reachable size.

I think static metadata defaults would only make sense if one can also specify for which paths those defaults apply. [...]

I could not follow your thoughts here and don't see why this is an added value comparing to setting the metadata when e.g. creating the thing JSON based on a Vorto model.

[...] those can also be set for the color object hence the requirement for metadata not only for leafs.

Yes, that's why the current _metadata implementation allows setting metadata also on objects.
I do not yet see that as requirement for implicitly created metadata (like the mentioned ones _modified and _revision or even _created) as those can be determined by their "children" as you mentioned.

If we think about other metadata e.g. from vorto like the description, measurementUnit or editable flag, those can also be set for the color object hence the requirement for metadata not only for leafs.

Yes, that's possible right now - should have nothing to do with implicitly added metadata.

_created [...]

Agreed, that could be useful as well. We also added _created on thing level in Ditto 1.2.0

[...] how does metadata cope with arrays?

What do you mean with that?
A metadata's value should be able to be an array.
Setting the metadata on an array value should be possible as well.
Setting the metadata to a JsonObject inside an array is most probably not possible as Ditto (ditto-json and the JsonPointer implementation we have in place) does not handle Json arrays very well.

@w4tsn
Copy link
Contributor

w4tsn commented Sep 16, 2020

I'm not sure if there is a use-case that justifies deactivating built-in metadata. Maybe that's a unnecessary complexity right now.

Sure - simply in order to save on the additional data volume when someone does not need it at all.
Also, Ditto managed things do have a max. size in bytes (by default 100kB). The implicit metadata could probably half this max. reachable size.

Fair enough! In that case: should built-in metadata be enabled or disabled by default? Is auto-metadata opt-in or opt-out? This behavior could be controlled on a global level or even per namespace.

Furthermore an alternative approach could be to set it this way:

"_metadata": {
  "settings": {
    "auto-metadata": {
      "_revision": true,
      "_modified": true
    }
  }
}

I think static metadata defaults would only make sense if one can also specify for which paths those defaults apply. [...]

I could not follow your thoughts here and don't see why this is an added value comparing to setting the metadata when e.g. creating the thing JSON based on a Vorto model.

Imagine a feature with a property tracking wireless devices in range (e.g. LoRa Gateways or Devices):

"LoRaGateways": {
  "properties": {
    "status": {
      "gatewaysInRange": {
        "some-id-ab79dc8d8f0f": {
          "signalStrength": -7,
          "noise": 100
        }
      }
    }
  }
}

and metadata:

"_metadata": {
  "features": {
    "LoRaGateways": {
      "properties": {
        "status": {
          "gatewaysInRange": {
            "some-id-ab79dc8d8f0f": {
              "noise": {
                "measurementUnit": "dB"
              }
            }
          }
        }
      }
    }
  }
}

In this case I know the metadata from the model in advance, but I don't know / have the actual objects in the gatewaysInRange map. I'd have to get the vorto model to be able to write the metadata for each new object in the map. I'd rather like to have this set once and ditto filling it in automatically.

To be able to do this I'd need the ability to set this metadata with the correct path from the vorto model at thing creation (or update it later of course) and ditto needs to apply this everytime I put a new object into such a map. This is a problem of complex data types and it may be argued if it's in dittos responsibility to manage this.

[...] how does metadata cope with arrays?

What do you mean with that?

I was wondering if I could apply metadata to the contents of an array and your last response answered this. So a work-around or solution is to use a map / object instead of an array in those cases.

@thjaeckle thjaeckle changed the title Implicit metadata creation on JSON leafs Implicit metadata creation on JSON leaves Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants