Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple .gv files per group? #99

Open
MeggyC opened this issue Nov 9, 2020 · 5 comments
Open

Multiple .gv files per group? #99

MeggyC opened this issue Nov 9, 2020 · 5 comments

Comments

@MeggyC
Copy link

MeggyC commented Nov 9, 2020

Hi there,

This is not so much an issue as it is a question. When I run Crass I end up with multiple .gv files for one group/array (if I have read the documentation correctly) - am I correct in thinking that Spacers_6 is all the spacers in array group 6, with the nucleotide sequence in the title being the DR sequence?

This is what the file list looks like (for Spacers_6):

Spacers_6_CGGTTCATCCCCACGCCTGTGGGGAACAC_spacers.gv
Spacers_6_CGGTTCATCCCTGCAGGCGCAGGGAACAC_spacers.gv
Spacers_6_CGGTTTATCCCCACACCTGTGGGGAACAC_spacers.gv
Spacers_6_GTTGTGAATTCCTTACAATTTTTTATATTTGCGCGTGAATCACAAC_spacers.gv

Does this mean that an array may have the spacers shown in any of these groups interspaced by a combination of different spacers?

Thanks!

@ctSkennerton
Copy link
Owner

You are correct that Spacers_6 should be the spacers from group 6 and the sequence should be the DR sequence. It's been a long time since I've developed this code so I'm not sure if the multiple sequences should be expected - my gut feeling is that the DR sequence should be unique to each group. I'll have to read through the code again to check on this. You ran crass once and got all of these files? Just want to make sure it's not something obvious like you've been running it a bunch of times with different parameters or files with the same output directory.

@MeggyC
Copy link
Author

MeggyC commented Nov 10, 2020

Hey there,

I've only run Crass once but it was done in parallel on a group of reads, i.e.

ls $READDIR/* | parallel -j8 crass {}
There is just one crass.crispr output file - and 15 .log files - there are 22 read files that were parsed into parallel

I don't know if that's helpful at all.

@ctSkennerton
Copy link
Owner

I think you've actually run crass multiple times, once for each of the input files in $READDIR. If you are intending to run crass once on all of the files in the directory at the same time you want to do something like crass $READDIR/*

@MeggyC
Copy link
Author

MeggyC commented Nov 23, 2020

Thanks Connor - that's very helpful - just one other question - do you think it's preferable to co-assemble the samples using the command above (crass $READDIR/*)? Or would you run Crass once on each read file?

@ctSkennerton
Copy link
Owner

I think if the files are all from one sample ⏤ like R1 and R2 files from Illumina generated data ⏤ then they should be run together. If they are from different samples, then it depends on your biological question. If you're interested in the differences between samples then it might make more sense to run them separately and compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants