Hi Michael, I am following up on our previous discussion, I ran DESeq2 with minReplicatesForReplace=Inf
and cooksCutoff=FALSE
and it actually increased the number of DE genes.
Here is the sample code
dds <- DESeqDataSetFromMatrix(countData = countsMatrix, colData = colData, design = ~ type); dds <- DESeq(dds); Gres <- results(dds, contrast=c("type","ABCD_DIF","ABCD_UND"), cooksCutoff = FALSE);
The big difference between deseq and deseq2 is there thousands of DE genes, even with 0.01 FDR. Is there any other criteria to filter the number of DEGs. The MA plots look fine (most of the genes are on x-axis and DE genes are colored RED)
Hi Michael,
Thank you very much.
Another potential problem could be coming from the data itself because the list of genes in the data are ~45k and many of these include microRNAs and other noncoding RNAs. Since the data is coming from mRNA, having these genes (even with zero counts) in the matrix would impact the multiple correction.
Is it a good idea to remove these genes before hand, if yes, where do you get the GFF/GTF file without the noncoding and pseudo genes?
Greatly appreciate your help.
Prasad
Having features with very small count won't affect the PCA for a few reasons: the transformations we recommend dampen the signal of log of low counts. Secondly, the plotPCA selects the top 500 by variance, and these low count features won't have high variance.
I am not talking about the PCA plot, with respect to the small counts(nc RNAs), but the number of DE genes.