Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching cluster label to origin sample id #228

Open
elemesemele opened this issue Apr 22, 2024 · 3 comments
Open

Matching cluster label to origin sample id #228

elemesemele opened this issue Apr 22, 2024 · 3 comments

Comments

@elemesemele
Copy link

elemesemele commented Apr 22, 2024

Dear @wheaton5, thank you for developing souporcell!
I want to match the cluster labels in the Souporcell output (clusters.tsv) to the original sample IDs.

I ran the pipeline using --known_genotypes and --known_genotypes_sample_names (with --skip_remap True).

The column in the --known_genotypes vcf file has three sample IDs (SampleA, SampleB, samlpeC) and I entered --known_genotypes_sample_names SampleA SampleB SampleC in the command.

And the singlecell GEM was actually multiplexed into these three samples (SampleA, SampleB, SampleC).

Is the order of the cluster labels in 'clusters.tsv' the same as the order of the sample IDs I entered (0=SampleA, 1=SampleB, 2=SampleC)?

thank you!!

@wheaton5
Copy link
Owner

Yes that should be correct

@elemesemele
Copy link
Author

elemesemele commented Apr 24, 2024

Yes that should be correct

Thank you for your kind response.
It's really nice that the order of the cluster is the same as the order of the original samples entered as input.

But is there any possibility that this matching is wrong?

For testing purposes, I ran the above command without skipping remapping (--known_genotypes and --known_genotypes_sample_names (with --skip_remap False).

As a result, the assigned cluster number has changed.
Why is this?

These are my results.
I think the order of cluster0 and cluster1 has been swapped.

--skip_remap True
image

--skip_remap False
image

Would you recommend the --skip_remap True option to assign cluster order and original samples accordingly?

Sincerely.

@plijnzaad
Copy link

If all is well results should be nearly identical. The --skip_remap just avoids the needless and time-consuming remapping (reassembling, really) of all the reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants