Question

Whether "read_counts_miRNA" or "reads_per_million_miRNA" to use for DE expression analysis using DeSeq2

0

Entering edit mode

Björn • 0

@bjorn-12199

Last seen 5.9 years ago

CH

Hi, I downloaded HARMONIZED miRNA data from TCGA. The dataframe have two columns 1)reads_per_million_miRNA_mapped_TCGA-HC-7211-01A-11R-2117-13 and 2) raw_count_miRNAs.

My questions

1. Should I use reads_per_million or raw_counts to compare DE miRNAs ?

2. I believe the "reads_per_million_miRNA" is already converted to CPM and normalized. IF yes, can I use the values straight away ?

3. If I use "reads_per_million_miRNA" straight away, shall I just remove miRNAs with "0" values ?

4. Same questions when I use "raw_counts_miRNAs"

best

B.

tcgadownload deseq2 mirna normalization • 1.4k views

ADD COMMENT • link updated 6.9 years ago by Lorena Pantano ▴ 140 • written 6.9 years ago by Björn • 0

score 1 · Answer 1 · 2018-06-15

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

If you are asking what kind of input DESeq2 expects, it is count scale data (so library size not divided out).

ADD COMMENT • link 6.9 years ago Michael Love 43k

score 0 · Answer 2 · 2018-06-15

Hi,

I would definitely avoid CPM. In miRNA data normally there are a very few bunch of miRNAs that can be expressed million of times, but this values can vary a lot, and using CPM can lead to increase the real variation of the data.

For instance, you can have one sample where 1 miRNA has 1 mill counts(out of 4 mill), and other samples only 200.000(out of 3 mill). Maybe that miRNA is the only one de-regulated, but if you use CPM, it will affect the rest of miRNAs as well (for instance a miRNA with 1000 counts in both data set that is equally expressed), what is not good. DESeq2 or edgeR should be able to find the correct size factor for the normalization.

cheers