I have a DGEGLM object from edgeR called fit2
fit2 <- glmFit(counts(Data), design, disp2$tagwise.dispersion,offset=-offst(dataOffset))
I want the logFC values so
logFC <- fit2$coefficients[geneA,2] # first column intercept, second column is my comparison fit2$fitted.values[geneA,] # is the vector with the fitted values TableWithFC <- aggregate(log2(fit2$fitted.values[geneA,]) ~ (condition),FUN= mean) logFC2 <- TableWithFC[2,2]-TableWithFC [1,2]
logFC is NOT equal to logFC2
If logFC is the right one (is the same I get if I continue the pipeline and I do glmLRT and topTags), how I can obtain the expression values of geneA for each sample in order to obtain a boxplot like this:
boxplot(log2(fit2$fitted.values["geneA",]) ~ (fit2$design[,2]))
but with the values giving the logFC of the fit2$coefficients[geneA,2] ?
In other words the boxplot with the fitted.values is not representative of the result with coefficients.
I want a boxplot representative of the logFC obtained with coefficients.
Thanks,
P
Hi Aaron,
thanks a lot for your answer, you made me reflect on the many reasons the values can be different.
I expected a minor difference in FC as you state in your answer BUT I find FCs with different sign comparing the results of my analysis and the cpm.
Let me explain that i'm using both EDAseq and RUVSeq prior edgeR to obtain my result.
According to EDASeq manual:
then according to RUVSeq using control genes taken from
Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013
I do:
RUVg does NOT fill the offset slot of the EDASeq object.
To use the offsets I need the result from EDASeq but I still want to Remove Unwanted Variation
so I modify the edgeR paragraph of the EDASeq manual:
then all by manual
lrt <- glmLRT(fit, coef=2) topTags(lrt) contrast_BvsA <- makeContrasts(TvsN=conditionsB-conditionA, levels=design) contrast_BvsA # Contrasts # Levels BvsA # conditionA -1 # conditionsB 1 # W_1 0 # W_1 is a vector to Remove Unwanted Variation lrt2 <- glmLRT(fit2,contrast = contrast_BvsA )
If I do
As you can see it's not a small difference.
Thanks,
P
In my answer, I was assuming that you were performing the simplest possible comparison between group means, i.e., a GLM fitted with a one-way layout. However, if you're using blocking factors that are not balanced between groups, all bets are off. To illustrate with a simple linear model:
So, you can see that the difference in sample means (-0.12) is quite different from the difference returned by the linear model (-0.91). This is because the latter accounts for the effect of the blocking variables on the apparent differences between groups. If you want to represent this in a boxplot, I'd suggest using
removeBatchEffect
on the log-CPMs with your groups indesign=
and all blocking factors incovariates=
.Dear Aaron,
thanks for your answer.
I tried your solution but with no luck.
Maybe it's related to the used of the offset from EDAseq package together with the RUVg function from RUVSeq package.
As you can read in the previous post my design is
so the only block factor i see is W_1 (which is the the estimated factors of unwanted variation).
so:
A difference bigger than before.
I remind you the
gives FC -1
Maybe I'm doing something wrong or there is an easiest way to integrate EDAseq normalization and RUVg results into edgeR pipeline.
Sorry for the many questions but I'd really like to use this pipeline (or a modified working one) in some papers we are preparing.
Best,
P
At low counts, this is less likely to match up due to the effect of the
prior.count
and the fact that the variance relative to the mean is higher (such that the mean of the logs becomes even more different to the log of the means). But otherwise it should be pretty consistent, as you can see from the plot above. If you're not getting this result, something probably went wrong - try working back from the code I've just provided above.