-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive structures of primitive codecs #174
Comments
What you're describing here is a general-purpose type language. I'd love that doesn't really obviate the need for the codec table in general. In addition to the structural information, the codecs give semantic information. For example, the "sha256" codec doesn't tell you what the structure of the data is, it tells you that the data is a sha256 digest. |
Semantic information can be done via an equivalent of C's typedef. I'd be kind of surprised if no one's made such a language, surely it has plenty of utility? I'm finding it hard to google for |
That's basically what this table is. It's the type declarations without the type definitions. The codes are effectively "short names" for the type names. You're right, there are already languages like this. For semantic information, there's an entire field of research (search for semantic web). For types, you'd need a language to describe a mapping between arbitrary serialized objects to structured data. Google points to https://en.wikipedia.org/wiki/Data_Format_Description_Language. |
+1 to Stebalien's comments. My favorite tincture when I get runaway thoughts in this area is to read about Gödel's Constructable Universes -- specifically the fact that they're not unique -- and remind myself that Gödel numberings are therefore unavoidable. If one decomposes "primitives" far enough, one always reaches a depth where further increases in generality mean proportional decreases in meaning itself. One reaches a Gödel numbering. Multicodecs are just jumping to the point and making the numbering at a level that's practical and useful to us in the level of the world where we're worried about distinguishing, well, codecs. The more we decompose it, the longer of a Church encoding we get; and while that's a fascinating exercise, it doesn't help do much... ...until the point one turns around and uses that to build an interpreter. But then, ah, there we have it: we've reached the point where it's effectively inventing a new language. (And signing on to all the joys of designing, building, maintaining, and growing an ecosystem around that language/interpreter!) And if we do that... At some point, someone might come along and observe that there's nothing particularly special about this encoding we've made versus some other Church encoding someone else has made. Because both encodings are effectively Gödelian Constructable Universes, that observation would be correct! Neither is clearly "purer" than the other. At the same time, if we each did the work particularly effectively, the Shannon entropy of the encoding should be maximum, so each encoding is indistinguishable from any pattern matching on its contents! And so... Then to distinguish them, you'd need... A magic number. Prepended somewhere. ... you see how it recurses :) At some point, one simply has to choose a place to stand one's ground. |
Instead of simply enumerating every possible codec, why don't we define an exhaustive set of primitives and composing operations which can be assigned an id for reduced size? At present they're hardly self-describing, an implementation has to already know how to decode a codec.
In essence, the current multicodec table would be given an additional column representing decomposition into more primitive types, and codes assigned to more primitive types and compositional operations. This way, an implementation that doesn't know a higher-order codec can fetch an up-to-date table and know how to decompose it. A "primitive" is defined as a row that lacks a decomposition column. Initially, all codecs are primitives, but as decomposition encoding improves, they can be redefined as higher-order without affecting older implementations. Older implementations already have logic defined for decomposing former primitives, while newer implementations either define similar special decoders or defer to the decomposition.
For some vague examples of what this might look like:
The most primitive type would be a bit, and compositions would emerge from operations which consume those bits. Since the multicodec table is already filled with some highly compositional types under this scheme, maybe it'd be better to create a new project?
The text was updated successfully, but these errors were encountered: