Difference between Fold Changes from EdgeR glmLRTest and directly calculated fold changes.
1
0
Entering edit mode
abf ▴ 30
@abf-14661
Last seen 2.3 years ago
United States

In the process of validating my procedure for using edgeR to calculate differential expression statistics; I've found that there are differences between the fold changes estimated by glmLRTest, and those calculated directly from the matrix of scaled cpm values returned by "cpm(y)". I am aware the the "prior.count" parameter can be be used to shrink log fold changes towards zero. I've tested this myself and examined the impact of different settings; ranging from 0 to 10. However with my data, even when prior.count is set to 0; it looks like there is still some shrinkage being applied to the fold change. My design matrix (bottom of post) does include a batch adjustment factor, and I suspect that this may play a role.

z<-z<-y[filterByExpr(y, design=design), , keep.lib.sizes=FALSE]
z<-estimateCommonDisp(z)
z<-estimateTagwiseDIsp(z)
z<-calcNormFactors(z)

Model is fit as follows:

  fit<-glmFit(z, design, prior.count=0)
  qlf<-glmLRT(fit, coef=2:5)
  deg<-as.data.frame(topTags(qlf, n=Inf))

The estimated fold changes disagree with those calculated from the cpm matrix

Estmated from the CPM matrix:

> gr1<-c(
+   "1_S1","2_S2","4_S3",
+   "WT_0_hr_1_S1","WT_0_hr_2_S2","WT_0_hr_3_S3",
+   "W_0_hr_1_S256","W_0_hr_2_S257","W_0_hr_3_S258"
+ )
> 
> gr2<-c(
+   "WT_6_hr_1_S4","WT_6_hr_2_S5","WT_6_hr_3_S15"
+ )
> aveLogCPM(z[genes, gr1])
[1] 6.211942 5.699160 3.259007 2.478364
> aveLogCPM(z[genes, gr2])
[1] 10.916058 10.616974  7.811504 10.667996



> aveLogCPM(z[genes, gr2]) - aveLogCPM(z[genes, gr1])
[1] 4.704117 4.917814 4.552497 8.189632

Reported by the glmLRT function:

> deg[genes,"logFC.Interval6H"]
[1] 4.553255 4.738738 4.285688 8.025348

For Reference, my design matrix is below:

> design
              (Intercept) Interval6H Interval24H Interval48H Interval120H LabDNA
1_S1                    1          0           0           0            0      0
2_S2                    1          0           0           0            0      0
4_S3                    1          0           0           0            0      0
6_S4                    1          0           1           0            0      0
7_S5                    1          0           1           0            0      0
8_S6                    1          0           1           0            0      0
9_S7                    1          0           0           1            0      0
10_S8                   1          0           0           1            0      0
11_S9                   1          0           0           1            0      0
WT_0_hr_1_S1            1          0           0           0            0      1
WT_0_hr_2_S2            1          0           0           0            0      1
WT_0_hr_3_S3            1          0           0           0            0      1
WT_6_hr_1_S4            1          1           0           0            0      1
WT_6_hr_2_S5            1          1           0           0            0      1
WT_6_hr_3_S15           1          1           0           0            0      1
WT_24_hr_1_S6           1          0           1           0            0      1
WT_24_hr_2_S7           1          0           1           0            0      1
WT_24_hr_3_S8           1          0           1           0            0      1
W_0_hr_1_S256           1          0           0           0            0      1
W_0_hr_2_S257           1          0           0           0            0      1
W_0_hr_3_S258           1          0           0           0            0      1
W_5_D_1_S259            1          0           0           0            1      1
7_S261                  1          0           0           0            1      1
W_5_D_3_S260            1          0           0           0            1      1
attr(,"assign")
[1] 0 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Interval
[1] "contr.treatment"
edger RNA Seq • 1.1k views
ADD COMMENT
0
Entering edit mode

By dropping the batch column from my design matrix, AND using the "prior.count" parameter with aveLogCPM, the fold change estimates that I calculate become much more similar.

              (Intercept) Interval6H Interval24H Interval48H Interval120H
1_S1                    1          0           0           0            0
2_S2                    1          0           0           0            0
4_S3                    1          0           0           0            0
6_S4                    1          0           1           0            0
7_S5                    1          0           1           0            0
8_S6                    1          0           1           0            0
9_S7                    1          0           0           1            0
10_S8                   1          0           0           1            0
11_S9                   1          0           0           1            0
WT_0_hr_1_S1            1          0           0           0            0
WT_0_hr_2_S2            1          0           0           0            0
WT_0_hr_3_S3            1          0           0           0            0
WT_6_hr_1_S4            1          1           0           0            0
WT_6_hr_2_S5            1          1           0           0            0
WT_6_hr_3_S15           1          1           0           0            0
WT_24_hr_1_S6           1          0           1           0            0
WT_24_hr_2_S7           1          0           1           0            0
WT_24_hr_3_S8           1          0           1           0            0
W_0_hr_1_S256           1          0           0           0            0
W_0_hr_2_S257           1          0           0           0            0
W_0_hr_3_S258           1          0           0           0            0
W_5_D_1_S259            1          0           0           0            1
7_S261                  1          0           0           0            1
W_5_D_3_S260            1          0           0           0            1
attr(,"assign")
[1] 0 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Interval
[1] "contr.treatment"

z<-estimateCommonDisp(z)
z<-estimateTagwiseDisp(z)
z<-calcNormFactors(z)

A<-aveLogCPM(z[genes,gr2], normalized.lib.sizes = T, prior.count = 0) 
B<-aveLogCPM(z[genes,gr1], normalized.lib.sizes = T, prior.count = 0)

> A - B
[1] 4.706790 4.921665 4.573017 8.226730



fit<-glmFit(z, design, prior.count=0)
qlf<-glmLRT(fit, coef=2:5)
deg<-as.data.frame(topTags(qlf, n=Inf))

> deg[genes,"logFC.Interval6H"]
[1] 4.707106 4.922300 4.574113 8.233953
ADD REPLY
0
Entering edit mode

deleted due to duplicate

ADD REPLY
0
Entering edit mode

Do you have question?

glmFit does not shrink the logFCs when prior.count=0. The function does what it is documented to do.

Obviously the glmFit negative binomial generalized linear model fit is much more sophisticated than simple averaging of logCPMs, but you haven't actually tried averaging logCPM values from cpm, so you haven't actually made that comparison.

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The coefficients for a GLM don't have a closed form solution, so you can't calculate them by hand using the logCPM values. Since you can't compute the coefficients, it stands to reason that you can't compute the fold changes either.

ADD COMMENT

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6