Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and Implement a Translation System #20

Open
adamnovak opened this issue May 20, 2019 · 2 comments
Open

Design and Implement a Translation System #20

adamnovak opened this issue May 20, 2019 · 2 comments

Comments

@adamnovak
Copy link
Member

@glennhickey needs a way to edit a MutableHandleGraph, and then translate read alignments from the old graph to the new graph (and back again?). This is necessary for variant calling in vg.

We also need to be able to serialize translation information, so we can create an updated graph, and then move it and the translation from the old graph to another machine.

To support this use case, we need an API for translating between graphs. Since we want to use it to translate between old and new versions of the same MutableHandleGraph, it can't (at least at the input side) be handle based, because handles to the old graph are invalidated on mutation.

We need to work out what this API should look like. Part of that is working out what the types to be translated should be; we don't want to build in a dependency on vg's Protobuf here.

Here's a first draft from me:

  • Have a type using region_t = vector<tuple<nid_t, bool, size_t, size_t>> to track a series of node, orientation, start, end intervals. They do not necessarily abut in the graph, but if they abut on the same node they must be coalesced. This serves as both the input and output type for translation.

  • Define some translation semantics. When something being translated lies on a node that has no counterpart in the destination graph, it translates to an interval on node 0 forward from 0 to n, depending on its length. If part of an input region is on node 0, it stays on node 0 in the translation output. So translating back and forth between graphs which don't fully contain each other is necessarily lossy.

  • Have an interface GraphToGraphTranslation:

    • region_t translate_forward(const region_t& from) const
    • And BidirectionalGraphToGraphTranslation:
      • region_t translate_backward(const region_t& to) const

I don't think we want an actual implementation of a translation in this repo. It should probably go in vg or in sglib. For that matter, if handle_t doesn't appear in the API anywhere, I'm not sure it makes sense for the interface to live in libhandlegraph at all.

An easy way to construct a translation might be a MutableHandleGraph overlay that logs mutation operations and constructs the translation that they define. But for serializing and loading translations, we'd need a way to re-load the translation later, and it might be more convenient if that didn't require having the graph available at load time.

Possible improvements:

  • Define region_t as a template, and have region_t<handle_t> and region_t<pair<nid_t, bool>>. Then we can have translations between real handlegraphs (useful for some of vg's overlays, where the backing graph sticks around unmodified), as well as between ID-defined graph spaces, by implementing GraphToGraphTranslation<handle_t, handle_t> or GraphToGraphTranslation<pair<nid_t, bool>, pair<nid_t, bool>>. You could even have translations that take handles on one end and ID, bool pairs on the other (which might be the right interface for translating to/from an old version of an augmented graph). This would help justify having this as part of libhandlegraph and not vg.

  • Define translation serialization/load as part of the interface. Do it in a way such that it makes sense to have a serializable graph with a serializable translation.

Me, @jeizenga, @glennhickey, and @ekg all need to discuss this.

@jeizenga
Copy link
Contributor

Overall I think this looks good. I'll also give a +1 to doing this development in VG rather than libhandlegraph. This doesn't feel very handle-y to me.

@glennhickey
Copy link
Contributor

This also sounds fairly reasonable to me. I will take a shot at wedging this over the current translation system in vg sometime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants