You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running a validation study to compare Parliament2 SV calls with the GIAB v0.6 truth set using 60X hg002.
I was looking at the VCF files at https://github.com/slzarate/parliament2/tree/master/benchmarking_data/hg002_benchmarks, specifically the HG002-NA24385-50x.70_percent.markdup.realigned.combined.genotyped.formatted.vcf file and wondering if all the DEL calls in the file were used to create Fig 1 of the Zarate et al. (2020) paper or were they again filtered (to keep only those calls corresponding to the Tier1 high confidence regions of the GIAB v0.6 truth set) for making the plots?
Based on the above two GIAB VCF and BED files, I'm finding that Parliament2 predicts significantly large number of deletions in the 400-1000 bp range, but Fig 1(b) in the Zarate et al. (2020) paper appears to show reasonably high precision and recall for that range. I'm finding around 21000 calls in the Tier1 high confidence regions, among which there are around 6000 DELs in the 400-1000 bp range (while the GIAB truth set has approximately 1300 DELs within that range).
I'd appreciate it if you can confirm if I am looking at the correct files for comparison and if I'm interpreting them correctly.
Thank you very much.
The text was updated successfully, but these errors were encountered:
tnnandi
changed the title
Request for pointing to the VCF files for comparison
Request for pointing to the correct VCF files for comparison
Jan 7, 2023
Hi,
I'm running a validation study to compare Parliament2 SV calls with the GIAB v0.6 truth set using 60X hg002.
I was looking at the VCF files at https://github.com/slzarate/parliament2/tree/master/benchmarking_data/hg002_benchmarks, specifically the HG002-NA24385-50x.70_percent.markdup.realigned.combined.genotyped.formatted.vcf file and wondering if all the DEL calls in the file were used to create Fig 1 of the Zarate et al. (2020) paper or were they again filtered (to keep only those calls corresponding to the Tier1 high confidence regions of the GIAB v0.6 truth set) for making the plots?
Also, can you please confirm if the VCF and BED files at https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NIST_SV_v0.6/ were used as the truth set, and to denote the Tier1 high confidence regions?
Based on the above two GIAB VCF and BED files, I'm finding that Parliament2 predicts significantly large number of deletions in the 400-1000 bp range, but Fig 1(b) in the Zarate et al. (2020) paper appears to show reasonably high precision and recall for that range. I'm finding around 21000 calls in the Tier1 high confidence regions, among which there are around 6000 DELs in the 400-1000 bp range (while the GIAB truth set has approximately 1300 DELs within that range).
I'd appreciate it if you can confirm if I am looking at the correct files for comparison and if I'm interpreting them correctly.
Thank you very much.
The text was updated successfully, but these errors were encountered: