I'm trying to understand how the mean-variance plot is calculated with the meanVarPlot() of EDASeq package. I tried to reproduce it manually with the code below but the result is very different.
Given the expression matrix (row=genes, columns=samples), I used the logarithm to transform the counts before computing mean and variance. Since there are zero counts, I considered the maximum between 0 and the log of the counts. After that I calculated the row means and the row variances (the mean and variance of each gene across all samples).
Here's the code I used:
rm(list=ls())
library(yeastRNASeq)
library(EDASeq)
data(geneLevelData)
head(geneLevelData)
Data=newSeqExpressionSet(as.matrix(geneLevelData))
meanVarPlot(Data,log=T)
geneLevelDataM=as.matrix(geneLevelData)
means=apply(pmax(log(geneLevelDataM),0),1,mean)
variances=apply(pmax(log(geneLevelDataM),0),1,var)
plot(means,variances)
means=apply(pmax(log2(geneLevelDataM),0),1,mean)
variances=apply(pmax(log2(geneLevelDataM),0),1,var)
plot(means,variances)
The function does indeed do something else than what the manual page says, i.e., it takes the log after computing mean and variance.
Thank you both for spotting this incongruence! I will fix it in devel.
Just to add to Wolfgang's answer re: source code, another way, within R, is to type: