-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass variant names via the GBWT #741
Comments
So you want to store the alt paths for large SVs as threads in the
GBWT, and then call against them?
We'd want to keep a separate GBWT from the one we use for storing the
haplotypes, I think. And the GBWT format isn't designed for efficient
offset determination, and I'm not sure how fast it is at tracing out a
whole thread, so we'd be kind of misusing it. But these SV threads
would be short and few enough that we probably could extract them and
index them for offset queries in memory when needed.
Shouldn't we consider an alternative, snarl-based variant call
representation for this instead, though? Or is this a prerequisite for
applying some of the VCF extensions proposed at the hackathon?
…On 3/27/19, Glenn Hickey ***@***.***> wrote:
There are two ways to look at variation inside SVs
* Run the calling in something like `--recall` mode, but turn on graph
augmentation. Then compare the insertion sequences that are called back to
the input. This is what we did at the hackathon
* Preserve the paths of SVs from the input VCF to construction. Then run vg
as a snp caller on each path. I've got a run going on chr1 for this, by way
of hacking `vg construct -a` to write a normal (instead of alt) path for
input variants.
That second option simply does not scale to whole genome, because xg cannot
index all the paths from the input vcf. So it's best just to use the GBWT.
To that end, we'd need to
* have an option to write human readable alt sequence names (fairly trivial)
provided the input vcf contains unique ids (often the case)
* input the GBWT to toil-vg call with an option to call on GBWT threads in
addition to or instead normal paths
* input the GBWT to vg call (if possible) so that the path name can get
passed into the VCF info field (if not calling on the path directly).
Calling on the SV paths directly allows using different parameters for snps
than svs (probably necessary). But doing SVs and nested variation at the
same time will have a chance at resolving breakpoints.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#741
|
I hadn't thought it through too clearly before typing that, but agree that
the GBWT as is probably isn't what I'm looking for. Am open to other
alternatives -- perhaps we can chat soon.
I'll look at the results of just doing it through the xg first to see if
it's worth pursuing. I think a general purpose annotation index for short
paths and/or subgraphs would be useful for this and other stuff too.
On Thu, Mar 28, 2019 at 10:40 AM Adam Novak <[email protected]>
wrote:
… So you want to store the alt paths for large SVs as threads in the
GBWT, and then call against them?
We'd want to keep a separate GBWT from the one we use for storing the
haplotypes, I think. And the GBWT format isn't designed for efficient
offset determination, and I'm not sure how fast it is at tracing out a
whole thread, so we'd be kind of misusing it. But these SV threads
would be short and few enough that we probably could extract them and
index them for offset queries in memory when needed.
Shouldn't we consider an alternative, snarl-based variant call
representation for this instead, though? Or is this a prerequisite for
applying some of the VCF extensions proposed at the hackathon?
On 3/27/19, Glenn Hickey ***@***.***> wrote:
> There are two ways to look at variation inside SVs
> * Run the calling in something like `--recall` mode, but turn on graph
> augmentation. Then compare the insertion sequences that are called back
to
> the input. This is what we did at the hackathon
> * Preserve the paths of SVs from the input VCF to construction. Then run
vg
> as a snp caller on each path. I've got a run going on chr1 for this, by
way
> of hacking `vg construct -a` to write a normal (instead of alt) path for
> input variants.
>
> That second option simply does not scale to whole genome, because xg
cannot
> index all the paths from the input vcf. So it's best just to use the
GBWT.
> To that end, we'd need to
> * have an option to write human readable alt sequence names (fairly
trivial)
> provided the input vcf contains unique ids (often the case)
> * input the GBWT to toil-vg call with an option to call on GBWT threads
in
> addition to or instead normal paths
> * input the GBWT to vg call (if possible) so that the path name can get
> passed into the VCF info field (if not calling on the path directly).
>
> Calling on the SV paths directly allows using different parameters for
snps
> than svs (probably necessary). But doing SVs and nested variation at the
> same time will have a chance at resolving breakpoints.
>
> --
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub:
> #741
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#741 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA2_7glLuOg0uy9K4hB0qh5dYkSIEw-6ks5vbPwegaJpZM4cOo1l>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are two ways to look at variation inside SVs
--recall
mode, but turn on graph augmentation. Then compare the insertion sequences that are called back to the input. This is what we did at the hackathonvg construct -a
to write a normal (instead of alt) path for input variants.That second option simply does not scale to whole genome, because xg cannot index all the paths from the input vcf. So it's best just to use the GBWT. To that end, we'd need to
Calling on the SV paths directly allows using different parameters for snps than svs (probably necessary). But doing SVs and nested variation at the same time will have a chance at resolving breakpoints.
The text was updated successfully, but these errors were encountered: