Hi all,
I'm relying on edgeR for differential expression analysis and have thus far been using TMM normalization to account for library composition effects, but I'm finding a very large number of DEGs. In this case, I'm unsure if my data set violates any assumptions for TMM, how robust TMM might be to these violations, and what consequences are to be expected. I'm using edgeR and TMM for two experiments, and both have a large number of DEGs. I've included sample code from one of these, where I'm using the exactTest to detect DEGs between two populations, each with 7 replicates. After filtering, ~17,800 genes are included in the model, and 34-38% are significant in any single direction at FDR<0.05. Thanks for reading.
data1 <-read.csv("/path/to/counts/genehits_count_matrix.csv", row.names="Gene") group1 <- (c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)) y1 <- DGEList(counts=data1,group=group1) dim(y1$counts) #[1] 24848 14 keep <- rowSums(cpm(y)>1) >=7 y1 <- y[keep, , keep.lib.sizes=FALSE] dim(y1$counts) #[1] 17872 14 y1 <- calcNormFactors(y1) y1$samples #group lib.size norm.factors #X9_Sorted.bam 1 23724378 1.0107928 #X15_Sorted.bam 1 22202313 0.8970733 #X17_Sorted.bam 1 22368615 1.0044862 #X43_Sorted.bam 1 22633426 0.8669456 #X45_Sorted.bam 1 23806764 0.9907505 #X79_Sorted.bam 1 23051023 0.9943360 #X149_Sorted.bam 1 21555943 0.9268498 #X110_Sorted.bam 2 26638422 1.0525179 #X139_Sorted.bam 2 27932858 1.0555890 #X23_Sorted.bam 2 22286424 1.0212521 #X63_Sorted.bam 2 21534110 1.0750907 #X69_Sorted.bam 2 24672222 1.0584811 #X87_Sorted.bam 2 21216005 1.0446583 #X94_Sorted.bam 2 23970976 1.0282707 y1 <- estimateDisp(y1) y1$common.dispersion #[1] 0.06447124 plotMDS(y1, col=as.numeric(y1$samples$group)) legend("bottomleft", as.character(unique(y1$samples$group)), col=1:4, pch=20) plotMDS(y1, method="bcv", col=as.numeric(y1$samples$group)) legend("bottomleft", as.character(unique(y1$samples$group)), col=1:4, pch=20) plotBCV(y1) et <- exactTest(y1) summary(de <- decideTestsDGE(et, p=0.05)) #2-1 #Down 6821 #NotSig 4953 #Up 6098
Alright, thank you for the feedback Aaron! I'll switch over to QL methods, as well. Happy holidays.