Question

Problem regarding the explanation of log-fold changes of individual coefficients in limma after eBayes

1

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 2 days ago

Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

i would like to ask for a specific question(maybe a bit) naive regarding the interprentation of log-fc values of specific contrasts from my object created after the eBayes function.

In detail, i have performed a paired analysis in limma regarding cancer vs adjucent control samples-60 samples and in total 30 patients

condition <- factor(eset.2$Disease, levels=c("Normal","Cancer"))
pairs <- factor(rep(1:30, each = 2))
design <- model.matrix(~condition +pairs)

head(design)
(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 pairs6 pairs7 pairs8 pairs9
1 1 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 0
3 1 0 1 0 0 0 0 0 0 0
4 1 1 1 0 0 0 0 0 0 0
5 1 0 0 1 0 0 0 0 0 0
6 1 1 0 1 0 0 0 0 0 0
pairs10 pairs11 pairs12 pairs13 pairs14 pairs15 pairs16 pairs17 pairs18 pairs19
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
pairs20 pairs21 pairs22 pairs23 pairs24 pairs25 pairs26 pairs27 pairs28 pairs29
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
pairs30
1 0
2 0
3 0
4 0
5 0
6 0...

fit <- lmFit(eset.2,design)
keep <- fit$Amean > median(fit$Amean)

fit2 <- eBayes(fit[keep,],trend=TRUE)

So, based on the fact that the rows of my phenoData information have exactly the same row(as they should have regarding the pairs argument), in order to perform the paired analysis, so:

head(fit2$coefficients)
(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 .....
10_at 10.702103 -2.59651160 0.02887899 -2.2916191 -0.27313819 -0.9222351
10001_at 7.520263 0.45086268 0.21904745 -0.1089900 0.09669243 0.1641267
10005_at 8.507699 -0.36371671 0.64127830 0.2583321 -0.32163256 0.7301137
10006_at 10.568751 -0.02002545 0.31434534 -0.3029731 -0.27448646 0.1310583
10007_at 8.552246 0.70665240 0.59376462 0.3137518 0.28487314 0.6358412
10010_at 7.899482 -0.09973025 0.54297155 0.2390233 0.26514041 -0.2023462

Thus, except the "conditionCancer" coefficient which is the most important, i would also like to inspect the log-fcs between in patient(pair) to have some further exploration. So, as i can see from above, the "pairs1"-that is the first two samples are used as the intercept, while each of the following pairs-represents the log-fold change between each cancer vs its adjucent control correct ?

head(pData(eset.2))
Disease Location Meta_ factor Study
St_1_WL57.CEL Normal sigmoid_colon 0 hgu133plus2
St_2_WL57.CEL Cancer sigmoid_colon 0 hgu133plus2
St_N_EC59.CEL Normal sigmoid_colon 0 hgu133plus2
St_T_EC59.CEL Cancer sigmoid_colon 0 hgu133plus2
St_N_EJ58.CEL Normal cecum 0 hgu133plus2
St_T_EJ58.CEL Cancer cecum 0 hgu133plus2

But why when i manually used:

eset.3 <- eset.2[keep,] # to have the same probesets with the same order as the fit2 object

and then test manually one probeset-the first :

mean(exprs(eset.3)[1,4]) - mean(exprs(eset.3)[1,3]) = -0.2808175 # for the 4rth and 3rd sample consisting the pairs2

returns a different result from the first row of "pairs2" regarding the 10_at probeset, which is the first line (0.02887899 from above). So why they are different ? or i miss something and it is complete other explanation? these should not be the same for the log-fold change value ?

limma logfoldchange limma design matrix paired samples affymetrix microarrays • 1.6k views

ADD COMMENT • link updated 9.6 years ago by Ryan C. Thompson ★ 7.9k • written 9.6 years ago by svlachavas ▴ 840

score 2 · Answer 1 · 2015-09-21

2

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 6 months ago

Icahn School of Medicine at Mount Sinai…

> Thus, except the "conditionCancer" coefficient which is the most important, i would also like to inspect the log-fcs between in patient(pair) to have some further exploration. So, as i can see from above, the "pairs1"-that is the first two samples are used as the intercept, while each of the following pairs-represents the log-fold change between each cancer vs its adjucent control correct ?

No, you've interpreted the coefficients incorrectly. Your design is ~condition + pairs, which is an additive model. This means that there is only one coefficient for the condition, and it represents the average logFC of cancer vs normal across all patients. The remaining coefficients (the pairs ones) represent the logFC between patient 1 and each other patient, and are likely of little interest to you.

ADD COMMENT • link 9.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Dear Ryan C. Tompson,

thank you for your recommendation. i wrongfully assumed that because the analysis i perform is actually a paired(moderated) t-test, the resulting pairs will represent the log-fold changes between the values !!

So, if i understand well , the first level of the pairs factor used as the intercept, is also used for the logFCs you mention above correctly ? That is the difference between each of the other groups of patients vs the first group (i.e for the above mentioned log fold change, is produced: mean(exprs(eset.2)[1,1:2]) - mean(exprs(eset.2)[1,3:4])

Thus, the factor pairs is actually used for the addictive model you mention to include patient-pair effects in the linear modeling, while the coefficients returned is something complete different and represents the correct difference you mention !!

Finally, i only have one final question(which i also checked but to be certain): So with or without the use of pairs in the design matrix(paired or non-paired), the actual important coefficient of cancer vs normal log fc remains the same("conditionCancer" )-but the other statistics change correct(i.e. t statistic, p-values, B statistic) ?

ADD REPLY • link 9.6 years ago svlachavas ▴ 840

0

Entering edit mode

Yes, I think you've got it more or less correct. Your formula for the inter-patient logFC is probably slightly different from the actual fitted coefficient, because the linear model fits all the coefficients simultaneously, rather than compute each one individually using formulas like the one you give.

ADD REPLY • link 9.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

I see- i followed the limmas vignette guide on page 42(9.4.1) (and with some valuable help from the specialists on the field on previous posts on paired samples), where also the samples in the phenoData object belong to the same patient with the same order.

ADD REPLY • link 9.6 years ago svlachavas ▴ 840