You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@glennhickey needs a way to edit a MutableHandleGraph, and then translate read alignments from the old graph to the new graph (and back again?). This is necessary for variant calling in vg.
We also need to be able to serialize translation information, so we can create an updated graph, and then move it and the translation from the old graph to another machine.
To support this use case, we need an API for translating between graphs. Since we want to use it to translate between old and new versions of the same MutableHandleGraph, it can't (at least at the input side) be handle based, because handles to the old graph are invalidated on mutation.
We need to work out what this API should look like. Part of that is working out what the types to be translated should be; we don't want to build in a dependency on vg's Protobuf here.
Here's a first draft from me:
Have a type using region_t = vector<tuple<nid_t, bool, size_t, size_t>> to track a series of node, orientation, start, end intervals. They do not necessarily abut in the graph, but if they abut on the same node they must be coalesced. This serves as both the input and output type for translation.
Define some translation semantics. When something being translated lies on a node that has no counterpart in the destination graph, it translates to an interval on node 0 forward from 0 to n, depending on its length. If part of an input region is on node 0, it stays on node 0 in the translation output. So translating back and forth between graphs which don't fully contain each other is necessarily lossy.
I don't think we want an actual implementation of a translation in this repo. It should probably go in vg or in sglib. For that matter, if handle_t doesn't appear in the API anywhere, I'm not sure it makes sense for the interface to live in libhandlegraph at all.
An easy way to construct a translation might be a MutableHandleGraph overlay that logs mutation operations and constructs the translation that they define. But for serializing and loading translations, we'd need a way to re-load the translation later, and it might be more convenient if that didn't require having the graph available at load time.
Possible improvements:
Define region_t as a template, and have region_t<handle_t> and region_t<pair<nid_t, bool>>. Then we can have translations between real handlegraphs (useful for some of vg's overlays, where the backing graph sticks around unmodified), as well as between ID-defined graph spaces, by implementing GraphToGraphTranslation<handle_t, handle_t> or GraphToGraphTranslation<pair<nid_t, bool>, pair<nid_t, bool>>. You could even have translations that take handles on one end and ID, bool pairs on the other (which might be the right interface for translating to/from an old version of an augmented graph). This would help justify having this as part of libhandlegraph and not vg.
Define translation serialization/load as part of the interface. Do it in a way such that it makes sense to have a serializable graph with a serializable translation.
Overall I think this looks good. I'll also give a +1 to doing this development in VG rather than libhandlegraph. This doesn't feel very handle-y to me.
@glennhickey needs a way to edit a MutableHandleGraph, and then translate read alignments from the old graph to the new graph (and back again?). This is necessary for variant calling in vg.
We also need to be able to serialize translation information, so we can create an updated graph, and then move it and the translation from the old graph to another machine.
To support this use case, we need an API for translating between graphs. Since we want to use it to translate between old and new versions of the same MutableHandleGraph, it can't (at least at the input side) be handle based, because handles to the old graph are invalidated on mutation.
We need to work out what this API should look like. Part of that is working out what the types to be translated should be; we don't want to build in a dependency on vg's Protobuf here.
Here's a first draft from me:
Have a type
using region_t = vector<tuple<nid_t, bool, size_t, size_t>>
to track a series of node, orientation, start, end intervals. They do not necessarily abut in the graph, but if they abut on the same node they must be coalesced. This serves as both the input and output type for translation.Define some translation semantics. When something being translated lies on a node that has no counterpart in the destination graph, it translates to an interval on node 0 forward from 0 to n, depending on its length. If part of an input region is on node 0, it stays on node 0 in the translation output. So translating back and forth between graphs which don't fully contain each other is necessarily lossy.
Have an interface
GraphToGraphTranslation
:region_t translate_forward(const region_t& from) const
BidirectionalGraphToGraphTranslation
:region_t translate_backward(const region_t& to) const
I don't think we want an actual implementation of a translation in this repo. It should probably go in vg or in sglib. For that matter, if
handle_t
doesn't appear in the API anywhere, I'm not sure it makes sense for the interface to live inlibhandlegraph
at all.An easy way to construct a translation might be a MutableHandleGraph overlay that logs mutation operations and constructs the translation that they define. But for serializing and loading translations, we'd need a way to re-load the translation later, and it might be more convenient if that didn't require having the graph available at load time.
Possible improvements:
Define
region_t
as a template, and haveregion_t<handle_t>
andregion_t<pair<nid_t, bool>>
. Then we can have translations between real handlegraphs (useful for some of vg's overlays, where the backing graph sticks around unmodified), as well as between ID-defined graph spaces, by implementingGraphToGraphTranslation<handle_t, handle_t>
orGraphToGraphTranslation<pair<nid_t, bool>, pair<nid_t, bool>>
. You could even have translations that take handles on one end and ID, bool pairs on the other (which might be the right interface for translating to/from an old version of an augmented graph). This would help justify having this as part of libhandlegraph and not vg.Define translation serialization/load as part of the interface. Do it in a way such that it makes sense to have a serializable graph with a serializable translation.
Me, @jeizenga, @glennhickey, and @ekg all need to discuss this.
The text was updated successfully, but these errors were encountered: