Question

miRNA differential expression analysis - treat as transcript level?

0

Entering edit mode

BG • 0

@829ad59b

Last seen 2.7 years ago

Australia

Hello,

I have a few questions about differential expression of miRNA with DESeq2/edgeR/limma.

Background

Mature miRNA differential expression analysis is usually performed by aligning trimmed miRNA reads against the genome or miRBase mature sequences with optimized parameters for aligners such as bwa aln -n 0, bowtie --best --strata, bowtie2 --very-sensitive-local or others such as miRDeep2. When aligned to the genome, one can use e.g. featureCounts with mature miRNA coordinates from miRBase to assign features and when aligned to miRase sequences directly one can use samtool idxstats. Established pipelines then use DESeq2, edgeR or limma to perform differential gene expression - treating each miRNA as a 'gene' i.e. the gene DE is essentially the same as transcript level DE.

In either case, we are really counting mature miRNA transcripts (they do not have the same splicing or isomer variants like other mRNA; miRNA can have isomiRs but usually are not relevant for bulk RNA-seq I believe - and most are simply 3' or 5' NTEs).

My dataset is small RNA sequencing, single ended, 51 bp - reads were trimmed and length filtered for 18 - 28 bp reads. For the purposes of this question, I have mapped the reads with Salmon (assume this is 'valid').

Query

When using DESeq2 or edgeR or limma and tximport with Salmon output for miRNA mapping, is it thus OK to use tximport with txOut = TRUE? Wouldn't this be similar to established pipelines which DESeq2, edgeR or limma to perform differential gene expression - treating each mature miRNA as a 'gene'? OR would Swish be a better method to use the boostraps/Gibbs?

Also, would catchSalmon in edgeR using boostraps/Gibb or tximport with txOut = TRUE be the more appropriate method?

The tximport docs say that it can and will import inferential replicates - does DESeq2 at all utilise these inferential replicates from tximport?

I have used R version 4.20, Bioconductor 3.15, tximport 1.24.0, DESeq2 1.36. I have not provided detailed code output or session information as my queries are more of theory questions rather than code-related.

tximport DESeq2 edgeR swish miRNA • 2.7k views

ADD COMMENT • link updated 2.8 years ago by Michael Love 43k • written 2.9 years ago by BG • 0

score 0 · Answer 1 · 2022-06-24

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

If it's "ok" depends on the amount of inferential uncertainty. We have benchmarked typical gene-level methods and found that we don't gain much in terms of accuracy for gene-level analysis when adding the uncertainty via Swish. However, we do see a benefit in accuracy when there is high uncertainty, as occurs for isoform-level analysis for some genes (see Swish publication for the isoform-level results). Swish can handle most designs that DESeq2 can. But before you deal with uncertainty, do you have a sense for the multi-mapping across miRNA? Or if you want to ignore the multi-mapping, you could just use unique counts from featureCounts?

ADD COMMENT • link 2.8 years ago Michael Love 43k

0

Entering edit mode

Hi, attached is the plot for the inferential variance (code according to the fishpond vignette). How does this look to you? The first plot should be log10 I believe. Salmon was used to map against the mature miRNA sequences from miRBase (18-27 bp for each reference miRNA).

enter image description here

ADD REPLY • link 2.8 years ago BG • 0

1

Entering edit mode

This is very low and I think you can just use regular count-based methods safely. If you look at the ~3 features above 1 those are probably highly similar sequences, you can add them together to avoid issues with quantification uncertainty.

Also -- you can just use the Salmon counts (e.g. in assay(se, "counts") or in txi$counts if you use tximeta or tximport), you don't need to re-count I think.

ADD REPLY • link 2.8 years ago Michael Love 43k