Dear all,
I have RNAseq data of two groups. After doing differential gene expression using edgeR and DESeq2(the results are similar). I found the distribution of logFC is a little strange, having two peaks in the distribution.
My guess would be that most genes are equally enriched in both goups, i.e. the mode of the logFC distribution (probably Gaussian-like) is around 0.
Perhaps someone can tell you why this is so.
Code can be seen bolow:
P1<-read.table("P1.STAR-counts-stranded.txt",header = TRUE) row.names(P1) <- P1[,1] P2<-read.table("P2.STAR-counts-stranded.txt",header = TRUE) row.names(P2) <- P2[,1] H3<-read.table("H3.STAR-counts-stranded.txt",header = TRUE) row.names(H3) <- H3[,1] H4<-read.table("H4.STAR-counts-stranded.txt",header = TRUE) row.names(H4) <- H4[,1] LIST<-list(P1,P2,H3,H4) COL<-unique(unlist(lapply(LIST, colnames))) ROW<-unique(unlist(lapply(LIST, rownames))) TOTAL<-matrix(data=NA,nrow=length(ROW), ncol = length(COL),dimnames = list(ROW,COL)) for (ls in LIST){ TOTAL[rownames(ls),colnames(ls)]<-as.matrix(ls) } write.csv(TOTAL,file="/home/Claire/Desktop/data/STAR_6/TOTAL-PH-STAR-counts-stranded.csv") #############################edgeR##################################### setwd("/home/Claire/Desktop/data/STAR_6/") library(edgeR) HPcount<-read.csv("TOTAL-PH-STAR-counts-stranded.csv",header = TRUE,sep = ",") HPcounts<-HPcount[,8:11] samplenames<-c("P1","P2","H3","H4") HPgeneid<-HPcount[,2] rownames(HPcounts)<-HPgeneid colnames(HPcounts) <- samplenames HPgenes<-HPcount[,2:7] group <- as.factor(c("P", "P", "H","H")) DGEList <- DGEList(counts=HPcounts, group=group,genes=HPgenes) DGEList$samples keep <- rowSums(cpm(DGEList)>0.5) >= 2 DGEList_keep <- DGEList[keep , keep.lib.sizes=FALSE] DGEList_keep_norm <- calcNormFactors(DGEList_keep) #design matrix design<-model.matrix(~0+group) colnames(design)<-levels(group) PvsH<-makeContrasts(P-H, levels = design) #Estimating the dispersion y<-estimateDisp(DGEList_keep_norm,design) fit<-glmFit(y,design) #to perform likelihood ration tests lrt<-glmLRT(fit,contrast = PvsH) lrt ###############################hist and volcanno plot######### volcanoData <- cbind(lrt$table$logFC, -log10(lrt$table$PValue)) colnames(volcanoData) <- c("logFC", "negLogPval") head(volcanoData) plot(volcanoData, pch=19) hist(lrt$table$logFC,breaks=100)
Where is the code you used to obtain the log-fold changes and to generate the plots?
Thank you for your reminding. The codes have been added in the question.
Are you sure this is right? You construct
design
twice but you have no command to constructPvsH
.Yes, I contructed PvsH, sorry about the negligence. This time it is right
Well, it can't be right, because you have:
... which doesn't make sense, because the column names from
model.matrix
should begroupP
andgroupH
before you rename them. So the call tomakeContrasts
at this location cannot be correct.Please try running your own code in a fresh session, i.e., with no saved workspaces. Given all these errors, it is difficult to trust that the code you have shown is actually generating the output that you claim it does.
I wanted to simplify the codes, which made it worse, sorry about that... This is the code I rerun with no saved workspaces, without any changes. It creates the exact the same outputs.