Question

deseq2 - many differentially expressed genes

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 10.4 years ago

United States

Hi Michael, I am following up on our previous discussion, I ran DESeq2 with minReplicatesForReplace=Inf and cooksCutoff=FALSE and it actually increased the number of DE genes.

Here is the sample code

dds <- DESeqDataSetFromMatrix(countData = countsMatrix, colData = colData,
                              design = ~ type);
dds <- DESeq(dds);
Gres <- results(dds, contrast=c("type","ABCD_DIF","ABCD_UND"), cooksCutoff = FALSE);

The big difference between deseq and deseq2 is there thousands of DE genes, even with 0.01 FDR. Is there any other criteria to filter the number of DEGs. The MA plots look fine (most of the genes are on x-axis and DE genes are colored RED)

deseq2 • 2.5k views

ADD COMMENT • link updated 10.4 years ago by Steve Lianoglou ★ 13k • written 10.4 years ago by Prasad Siddavatam ▴ 150

score 1 · Answer 1 · 2014-12-15

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

hi Prasad,

This is not surprising that the number of DE genes increased, as you have turned off outlier filtering.

That you have many genes with small false discovery rate means that the fold changes are large between conditions, in particular large with respect to the within-group dispersion, and that your experiment was sufficiently powered to discover many differences.

I would follow the suggestion in my previous response: "You can reduce the size of the list you are interested in by either lowering the alpha or using the lfcThreshold argument of results()." A: deseq2 - many differentially expressed genes

results(dds, lfcThreshold=1)

ADD COMMENT • link 10.4 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thank you very much.

Another potential problem could be coming from the data itself because the list of genes in the data are ~45k and many of these include microRNAs and other noncoding RNAs. Since the data is coming from mRNA, having these genes (even with zero counts) in the matrix would impact the multiple correction.

Is it a good idea to remove these genes before hand, if yes, where do you get the GFF/GTF file without the noncoding and pseudo genes?

Greatly appreciate your help.

Prasad

ADD REPLY • link 10.4 years ago Prasad Siddavatam ▴ 150

0

Entering edit mode

Having features with very small count won't affect the PCA for a few reasons: the transformations we recommend dampen the signal of log of low counts. Secondly, the plotPCA selects the top 500 by variance, and these low count features won't have high variance.

ADD REPLY • link 10.4 years ago Michael Love 43k

0

Entering edit mode

I am not talking about the PCA plot, with respect to the small counts(nc RNAs), but the number of DE genes.

ADD REPLY • link 10.4 years ago Prasad Siddavatam ▴ 150