Document.annotations.in(<schema>) #183

colin-alexa · 2019-08-27T19:11:58Z

Right now the Document is the primary typed entity in atjson, which works fairly well in practice but is conceptually a little confusing. When we convert documents between different sources what we're really converting is the collection of annotations associated with that document. I think it would be conceptually simpler if the notion of a type or annotation schema existed primarily at the level of the annotation.

I propose adding a few types to the library:
AnnotationCollection<T extends AnnotationSchema> with public members AnnotationCollection<T>.convertTo(schema: ) : AnnotationCollection and AnnotationCollection<T>.in(schema: ) : AnnotationCollection

convertTo(schema: : AnnotationCollection is Document.convertTo, just scoped to work on a list of annotations instead
in(schema: ) : AnnotationCollection returns a new AnnotationCollection where any unknown annotations in schema from the original collection are made known, and all other annotations are made unknown.

AnnotationSchema is the parent type of schema definitions. This would take over much of the current role occupied by Document and its subtypes. Converters would be defined between subtypes of AnnotationSchema rather than subtypes of Document. They would also obviously 'own' the definition of their annotations, and would have a content type string for marking their annotations during de/serialization.

This proposal doesn't give the library any additional power (since a document is just a piece of un-annotated media and an AnnotationCollection, and the media portion doesn't at all complicate the associated schema) but I think it would be helpful for explaining the library and would help generally separate independent concerns within the system

The text was updated successfully, but these errors were encountered:

tim-evans · 2019-08-27T20:12:25Z

This is wonderful! I'd be interested in what the shape of a schema looks like. Is something like the following reasonable?

interface Schema {
  type: string;
  version: string;
  annotations: {
    [key: string]: typeof Annotation
  ];
}

Also, extending from this is a series of questions around code organization and how to define converters / fromRaw methods, etc.

tim-evans · 2019-08-27T20:15:01Z

Some examples of how this would be used in the wild would be lovely too! Is this the correct invocation for something?

let doc = new Document({
  content: 'Hello, world',
  annotations: [{
    type: '-commonmark-p',
    start: 0,
    end: 12
  }]
}).in(CommonmarkSchema);

colin-alexa · 2019-08-27T21:21:26Z

@tim-evans that schema interface looks chill to me.

regarding fromRaw I think it makes sense to keep the notion of a source but have it be solely responsible for consuming raw input and making an atjson document out of it (Honestly in my ideal world fromRaw would be an overloaded static method on the Document that like picks the correct implementation based on a user-specified desired return type)

I'm interested if people think there are any operations that currently live on the document that would make more sense moved to the annotations? .in seems like it would make sense in both places although I think it's most useful on the document. I think .where (and possibly .update?) make more sense on the annotations

tim-evans · 2019-08-29T15:21:13Z

If you feel comfortable, could you define a general interface for Document and Collection so there's more of a bird's eye view of what would be affected?

(This is an example of what I would find most helpful:)

interface Schema {
}

export class Document<S extends Schema> {
  annotations: Collection<S>;
  new(): Document<S>;
  in<T extends Schema>(schema: T): Document<T>;
  convertTo<T extends Schema>(schema: T): Document<T>;
}

export class Collection<S extends Schema> {
  add();
  remove();
  replace();
}

// Equivalent old / new APIs
// -or-
// What old APIs would de-sugar to in terms of new APIs

tim-evans · 2019-09-17T15:24:36Z

I've created a working example of this in the TypeScript Playground

tim-evans · 2019-10-07T15:17:55Z

Ok, so I've worked through this a bit and this requires a lot of moving parts, but there's a few things that we can do in preparation to start this refactor to minimize large changes everywhere:

Use named exports for annotations
Move all utility functions out from the annotations directory to the top level.

Unfortunately, the rest of the changes will require some level of coordination or API changes that require different API signatures.

tim-evans · 2019-10-07T15:24:45Z

Also, here's a list of breaking changes that will likely be occurring in this proposal:

schema is an additional property for the constructor to a Document.
fromRaw on sources will no longer be used— schemas will be separated from the parsing. The suggestion here is to make the named export fromRaw be a function that returns an instantiated Document from the source. This maps fairly well to existing code.
@atjson/offset-annotations will be deprecated in favor of @atjson/schema-offset.
All sources should have their default export be the schema for that source.

tim-evans · 2019-11-02T21:58:23Z

This feature is a bit... all encompassing. It is a major improvement to the core api, but definitely touches a whole load of stuff. To make these changes reasonably reviewable, we'll need to break it up into several steps. I think it makes the most sense to merge all of these changes into a release/1.0.0 branch, since this feels like one of the last breaking changes we would make with the core api.

There's a lot of ways to approach this, but I'm going to give some suggestions on what may be the most logical approach:

Change the `schema`

Make the schema on a source match our new schema interface. This will be one of the bigger changes, and is designed to be a bit less aggressive than breaking everything.

class Document {
  static schema: Array<AnnotationConstructor<any, any>>;
}

will be modified to be:

class Document {
  static schema: Schema;
  schema: Schema;
}

In addition, the schema will be added to the Document instance, which will allow improvements to type inference to occur after this change.

Improve type inferences:

This is a bit involved, but it involves making our querying and joins reflect the types passed in more accurately. We'll add two more overloads to the Document#where api that allows types and strings. The string key will be the name of the annotation in the schema, and the type will be the type of annotation:

doc.where(doc.schema.Bold) // Returns Collection<Bold>
doc.where("Bold") // Returns Collection<Bold>

tim-evans assigned blaine Aug 27, 2019

tim-evans added the 🎉 Feature A new feature or request label Oct 8, 2019

tim-evans mentioned this issue Nov 13, 2019

Named exports #289

Merged

tim-evans mentioned this issue Nov 25, 2019

Define Schema interface #311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document.annotations.in(<schema>) #183

Document.annotations.in(<schema>) #183

colin-alexa commented Aug 27, 2019

tim-evans commented Aug 27, 2019

tim-evans commented Aug 27, 2019 •

edited

Loading

colin-alexa commented Aug 27, 2019

tim-evans commented Aug 29, 2019 •

edited

Loading

tim-evans commented Sep 17, 2019 •

edited

Loading

tim-evans commented Oct 7, 2019 •

edited

Loading

tim-evans commented Oct 7, 2019 •

edited

Loading

tim-evans commented Nov 2, 2019 •

edited

Loading

Document.annotations.in(<schema>) #183

Document.annotations.in(<schema>) #183

Comments

colin-alexa commented Aug 27, 2019

tim-evans commented Aug 27, 2019

tim-evans commented Aug 27, 2019 • edited Loading

colin-alexa commented Aug 27, 2019

tim-evans commented Aug 29, 2019 • edited Loading

tim-evans commented Sep 17, 2019 • edited Loading

tim-evans commented Oct 7, 2019 • edited Loading

tim-evans commented Oct 7, 2019 • edited Loading

tim-evans commented Nov 2, 2019 • edited Loading

Change the schema

Improve type inferences:

tim-evans commented Aug 27, 2019 •

edited

Loading

tim-evans commented Aug 29, 2019 •

edited

Loading

tim-evans commented Sep 17, 2019 •

edited

Loading

tim-evans commented Oct 7, 2019 •

edited

Loading

tim-evans commented Oct 7, 2019 •

edited

Loading

tim-evans commented Nov 2, 2019 •

edited

Loading

Change the `schema`