Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA methylation calling #67

Open
SilvanHaelg opened this issue Feb 19, 2021 · 8 comments
Open

RNA methylation calling #67

SilvanHaelg opened this issue Feb 19, 2021 · 8 comments

Comments

@SilvanHaelg
Copy link

Dear Peng,

I wonder if it is possible to use Deepsignal to detect methylations in RNA from samples sequenced by the direct RNA sequencing kit from ONT.
I think about training a model by using complete methylated RNA and non methylated cDNA as training data. Do you think that the different signals between RNA and cDNA would interfere with the methylation signal and therefore lead to a not very accurate model?

Thanks for your time
Silvan

@PengNi
Copy link
Collaborator

PengNi commented Feb 19, 2021

Hi @SilvanHaelg , thanks for your interest of deepsignal. We are working for RNA modification detection using deepsignal too. But currently we haven't trained a statisfying model. What you said may work, however I cannot guarantee the performance now.

For RNA modification detection, you can also check other existing tools, such as nanom6A, EpiNano, or nanoDoc.

Best,
Peng

@SilvanHaelg
Copy link
Author

Dear Peng
Thank you for the quick reply.
I will try to train a model with my approach and test your program suggestions as well. How long do you think would it go until you have implemented the RNA modification detection?
Best,
Silvan

@PengNi
Copy link
Collaborator

PengNi commented Feb 19, 2021

We have no concrete timeline now. It may be months.

Best,
Peng

@pterzian
Copy link

Hi @PengNi ! I am trying a similar approach than OP but more model training oriented so I only need to extract features. Yet I could not succeed to run the resquiggle command on our RNA dataset.

I have this message :

[14:53:41] Loading minimap2 reference.
[14:53:41] Getting file list.
******************** ERROR ********************
	Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use `tombo annotate_raw_with_fastqs` to add basecalls from FASTQ files to raw F
AST5 files.

So I tried to look up into the reads and I can see the basecalls in the dedicated Basecall_1D_000 field. So when trying to tombo preprocess annotate_raw_with_fastqs, it just tells me it added the sequence of 0 reads.

I thought you may had an idea of where this issue could come from. Unfortunately tombo's github looks quite dead at the moment.

Best,

Paul

@PengNi
Copy link
Collaborator

PengNi commented Sep 15, 2022

Hi @pterzian , did you use tombo annotate with the option (maybe --summary I think), to add the summary file of the reads in fastqs. If you didn't, please try.

@PengNi
Copy link
Collaborator

PengNi commented Sep 16, 2022

@pterzian , I checked, it is --sequencing-summary-filenames. Previously I don't need this parameter specified, but recently it seems this has to be set for annotate_raw_with_fastqs.

@pterzian
Copy link

Thanks for the help @PengNi, I tried annotating fast5 with this option, but this is the output :

[11:16:47] Getting read filenames.
[11:16:47] Parsing sequencing summary files.
******************** WARNING ********************
	Some FASTQ records from sequencing summaries do not appear to have a matching file.
[11:17:08] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [09:14, ?it/s]
[11:26:22] Added sequences to a total of 0 reads

I know some fastq records can't be found in fast5 because I extracted only single fast5 reads mapping a specific contig and I am using the full concatenate fastqs in the tombo annotate command. I checked for some fast5 read IDs for if I could find them into the sequencing summary and fastqs and I do, so I am not sure where the issue come from.

Looks like I am going to dig more and do more testing!

Best,

Paul

@PengNi
Copy link
Collaborator

PengNi commented Sep 16, 2022

For now I can think about another two possible reasons:
(1) multi-reads/single-read format.
(2) VBZ compression issue.

Hope you find the reason soon!

Best,
Peng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants