Whether "read_counts_miRNA" or "reads_per_million_miRNA" to use for DE expression analysis using DeSeq2
2
0
Entering edit mode
Björn • 0
@bjorn-12199
Last seen 5.5 years ago
CH

Hi, I downloaded HARMONIZED miRNA data from TCGA. The dataframe have two columns 1)reads_per_million_miRNA_mapped_TCGA-HC-7211-01A-11R-2117-13 and 2) raw_count_miRNAs.

My questions

1. Should I use reads_per_million or raw_counts to compare DE miRNAs ?

2. I believe the "reads_per_million_miRNA" is already converted to CPM and normalized. IF yes, can I use the values straight away ?

3. If I use "reads_per_million_miRNA" straight away, shall I just remove miRNAs with "0" values ?

4. Same questions when I use "raw_counts_miRNAs"

best

B.

tcgadownload deseq2 mirna normalization • 1.3k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

If you are asking what kind of input DESeq2 expects, it is count scale data (so library size not divided out).

ADD COMMENT
0
Entering edit mode
@lorena-pantano-6001
Last seen 6 months ago
Boston

Hi,

 

I would definitely avoid CPM. In miRNA data normally there are a very few bunch of miRNAs that can be expressed million of times, but this values can vary a lot, and using CPM can lead to increase the real variation of the data.

For instance, you can have one sample where 1 miRNA has 1 mill counts(out of 4 mill), and other samples only 200.000(out of 3 mill). Maybe that miRNA is the only one de-regulated, but if you use CPM, it will affect the rest of miRNAs as well (for instance a miRNA with 1000 counts in both data set that is equally expressed), what is not good. DESeq2 or edgeR should be able to find the correct size factor for the normalization.

cheers

ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6