Question

Best practice for quantifying sncRNA

0

Entering edit mode

Abhishek • 0

@e3a7a4dc

Last seen 13 months ago

Australia

Hi all,

I have previously worked directly with read counts files, but this is my first time trying to generate read count from fastq files.

QIAseq miRNA Library prep was used for the experiment. (I was not the one who performed the library prep. I just received the fastq files)

Reading few tutorials and following http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html, Salmon seems to be the state of the art method for generating read count files. My project is specifically focused on small non-coding RNAs.

Can someone please help me with these questions ?

I'd want to identify specific sncRNA as biomarkers (something like miR-182-5p is increased for cancer patients). So is it necessary to create read count files with rownames as genes ? Can read count files with sncRNAs as rownames be directly created ? (rather than first creating one with genes and then mapping it to sncRNAs with something like miRBase)
Is Salmon (following the steps in the link above) recommended for quantifying sncRNAs ? https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4869-5 specifies that alignment-free tools are not ideal for small RNAs

Salmon sncRNA RNASeq • 2.9k views

ADD COMMENT • link 3.4 years ago Abhishek • 0

score 0 · Answer 1 · 2021-07-27

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 30 minutes ago

United States

I personally wouldn't use salmon for miRNA transcripts. We have had not completely horrible results using Qiagen's miRNA NGS Seq service, where they incorporate UMI barcodes and then you use GeneGlobe to quantify the UMI counts. Aligning a 21-23 mer to the genome is pretty tough, and you really need to ensure there is no adapter contamination, and you really want to exclude any PCR duplicates (which you cannot do without a UMI).

And even after all that, miRNA-Seq data seem to be supremely noisy. It's my opinion that miRNA is a mirage intended to make MDs salivate and biostatisticians wail in misery, but what do I know?

ADD COMMENT • link 3.4 years ago James W. MacDonald 67k

0

Entering edit mode

It's my opinion that miRNA is a mirage intended to make MDs salivate and biostatisticians wail in misery, but what do I know?

I'll tell you what I know ... I think we need to start a biocfortunes package ...

ADD REPLY • link 3.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thank you for sharing your thoughts.

I used Salmon and indexed using hg38 gencodes transcriptome. After the tximport step, I ended up getting read counts file with significant number of protein-coding genes. Not sure if this is some impurity in the sample, or something wrong with the quantification using Salmon (without adapter trimming)

Hoping that adapter trimming would give better results.

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

If the sequences are 20-something bp long then it appears logical to me to throw away any reads that are longer than that after adapter trimming, at least if you want to quantify the mature sequences. That would at least remove the obvious noise. For salmon how did you index the the transcriptome? You would need to lower the default kmer size of 31 quite a bit to even be able to ma these small sequences. And if so I think you would need to use the genome as decoy to have a notable advantage of salmon. You would actually not need tximport as small RNAs do not have isoforms, do they? Can you add some details? By the way, is this normal RNA-seq or specifically a smallRNA-seq, if the former then these small RNAs will not be properly represented, both because standard RNA extraction kits do not capture them well and because the size selection steps during library prep are optimized for fragments notable longer than 20-something bp (rather in the 200bp range).

ADD REPLY • link 3.4 years ago ATpoint ★ 4.6k

0

Entering edit mode

Thank you for your suggestions.

Throwing away larger reads sounds true, I'll try using Cutadapt for that besides adapter trimming.

I did realize default k-mer size is large. I'm rerunning with 10 and 20 k values

I did not quite understand the reason for not using txtimport. I have been wondering if mapping to genes is necessary - can it directly be mapped to miRNA or other sncRNA names

QIAseq miRNA Library was used, I was not the one who prepared the samples. I just received the fastq files

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

k has to be odd - trying 9, 21 instead

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

I did use the genome as decoy. And I used whole transcriptome from Gencode for indexing. Maybe I should index using just non coding transcriptome. However, only lncRNA transcriptome was present in gencode, which is not what I want.

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

I wonder what would be a good metric to assess the quantification quality

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

You may well get lots of protein coding genes, depending on what's in the transcripts file. Do note that most miRNA transcripts are complementary to some portion of the mature mRNA of their targets, which is how they work after all, so it's not unexpected that salmon would count reads complementary to a transcript as a hit. There may be some way to tell salmon not to do that, but I would in general not use salmon, instead I would probably use bowtie. You absolutely don't want gapped alignments, and you don't want the aligner to think that you have paired end reads (you don't, do you? That would be hilarious) because one of the pairs will be on the opposite strand. You could look at the documentation for GeneGlobe to see if they say how they do the alignment, but I am pretty sure they just use bowtie.

ADD REPLY • link 3.4 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks, I will try out Bowtie.

Is GeneGlobe and Qiagen the same company ? https://www.qiagen.com/us/resources/download.aspx?id=bea2dcfa-0a5c-47c5-afd8-8b0fe90a471a&lang=en From this link, Bowtie is used

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

If you used the QIASeq miRNA library, then you are wasting your time. That library adds the UMI barcodes and is intended for you to then upload the FASTQ files into GeneGlobe and processing to get the UMI counts. Doing anything else, unless you are an expert and have some special knowledge that allows you to do better than Qiagen, is counter-productive.

ADD REPLY • link 3.4 years ago James W. MacDonald 67k

0

Entering edit mode

Oh, thank you for that.

Do they allow downloading the UMI counts file or directly show visualizations / results based on that ? And can the UMI counts file can be just used like a read count file ? (i.e. can be normalized and then DE analysis ?)

ADD REPLY • link 3.4 years ago Abhishek • 0

0

Entering edit mode

That question is about GeneGlobe, which is a Qiagen product. I would recommend going to their site and figuring out for yourself.