-
Notifications
You must be signed in to change notification settings - Fork 108
Self describing objects in the data model layer #312
Comments
This is probably going to get me in trouble, but CIDs with identity multihash are a bit of an escape hatch that could be treated something like a tag. In addition, we also recently added a "reserved range" (mainly for experimentation purposes) in the multicodec table that could be combined with identity multihash to do ... creative things. Pointing to a data model form of a schema with a CID that's embedded in the block itself is something we've discussed a fair bit but we've never got to the point of pulling the trigger on that - I think partly because we just haven't got enough practical use of schemas yet to test the viability (and sensibility) of this. Perhaps there's a way to combine ideas here, identity multihash, using CIDs as pointers to other things within (or outside) a doc. Certainly something you'd want to do a lot more experimenting with before we baked anything official into IPLD, you're dabbling in mad science here after all! @vmx is also heads-down in WASM land too, chasing a vision that @mikeal has been primarily pushing to be able to embed WASM code into IPLD blocks. The first use-case is to get codecs into WASM so you don't need native codecs to interpret data. But beyond that there's a lot of scope for being able to do things like traverse complex data structures using an algorithm that itself is in IPLD and can be fetched by a CID. Perhaps this might be an interesting area for you to explore too? |
Thanks for the info! In practice you can probably get away with just using I will keep playing around with the IPLD programming language stuff and see what seems to come up in practice.
Personally I'm on the fence about such a thing. As it'd require hardcoding the choice of schema language, which means it'd necessarily be too coarse for certain data types. If the tag is just an arbitrary IPLD structure then you avoid the above problem, allowing people to opt in to a schema language of their choice. They could of course still use the schema language provided in this repo. You could get some of the benefits of a standardized schema language whilst avoiding the above problem by making sure the schema language is extensible. For sufficiently complex data types an overly coarse IPLD schema can be given, and a custom precise schema can be added within it. This still ends with the IPLD data model spec being quite a bit more complicated, as you effectively have to embed the entire IPLD schema spec within it, rather than it just lying on top as an independent spec. |
I have been playing around with building a programming language on top of the IPLD data model.
It seems ideal to allow lists, integers and floats in this language to map directly to the equivalent data model kinds.
However I also need to encode things like lambdas and case statements and so on, and these objects can more or less appear anywhere within other objects like lists.
Currently this means I need to choose one data model kind to not use directly, and instead treat as a tag/wrapper. For example I could store all lists in my language as
["list", [1, 2, 3]]
which frees up["lambda", ...]
and similar.This is by no means a deal breaker, and I can provide things like a
literal
function that convert external IPLD data model objects into a literal in the language, but it seems worthy of discussion.The way
cbor
itself addresses things like this is via tags (in which IPLD has registered42
for cids), so it seems like a similar feature may be appropriate for the data model.It seems as though we can take advantage of IPLD/IPFS here and use something like a
cid
for the tag, as the extreme compactness ofmulticodec
seems overkill. This would avoid having to centrally ration them out or worry about false collisions.This may also be relevant to discussions about mime types. As I personally agree with the stance that multicodec is an inappropriate place to put a mime type, particularly since it seems like you would need something like both
dag-pb-jpg
andraw-jpg
to differentiate the full file node from the broken up chunks. However you could definitely store information like that in such a tag.This also seems like it could be relevant to schemas, as objects could be directly tagged with the schema they are supposed to follow, by storing the schema as an IPLD object and pointing to that cid.
The text was updated successfully, but these errors were encountered: