Hello,
I have a few questions about differential expression of miRNA with DESeq2/edgeR/limma.
Background
Mature miRNA differential expression analysis is usually performed by aligning trimmed miRNA reads against the genome or miRBase mature sequences with optimized parameters for aligners such as bwa aln -n 0
, bowtie --best --strata
, bowtie2 --very-sensitive-local
or others such as miRDeep2. When aligned to the genome, one can use e.g. featureCounts
with mature miRNA coordinates from miRBase to assign features and when aligned to miRase sequences directly one can use samtool idxstats
. Established pipelines then use DESeq2, edgeR or limma to perform differential gene expression - treating each miRNA as a 'gene' i.e. the gene DE is essentially the same as transcript level DE.
In either case, we are really counting mature miRNA transcripts (they do not have the same splicing or isomer variants like other mRNA; miRNA can have isomiRs but usually are not relevant for bulk RNA-seq I believe - and most are simply 3' or 5' NTEs).
My dataset is small RNA sequencing, single ended, 51 bp - reads were trimmed and length filtered for 18 - 28 bp reads. For the purposes of this question, I have mapped the reads with Salmon (assume this is 'valid').
Query
When using DESeq2 or edgeR or limma and tximport with Salmon output for miRNA mapping, is it thus OK to use tximport with txOut = TRUE
? Wouldn't this be similar to established pipelines which DESeq2, edgeR or limma to perform differential gene expression - treating each mature miRNA as a 'gene'? OR would Swish be a better method to use the boostraps/Gibbs?
Also, would catchSalmon
in edgeR using boostraps/Gibb or tximport with txOut = TRUE
be the more appropriate method?
The tximport docs say that it can and will import inferential replicates - does DESeq2 at all utilise these inferential replicates from tximport?
I have used R version 4.20, Bioconductor 3.15, tximport 1.24.0, DESeq2 1.36. I have not provided detailed code output or session information as my queries are more of theory questions rather than code-related.
Hi, attached is the plot for the inferential variance (code according to the fishpond vignette). How does this look to you? The first plot should be log10 I believe. Salmon was used to map against the mature miRNA sequences from miRBase (18-27 bp for each reference miRNA).
This is very low and I think you can just use regular count-based methods safely. If you look at the ~3 features above 1 those are probably highly similar sequences, you can add them together to avoid issues with quantification uncertainty.
Also -- you can just use the Salmon counts (e.g. in
assay(se, "counts")
or intxi$counts
if you use tximeta or tximport), you don't need to re-count I think.