Correct implementation of multifactorial paired analysis in limma software package for DE expression
2
1
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Bioconductor Community,

in a previous thread( https://support.bioconductor.org/p/64372/#64413) i have asked about the implementation of a paired analysis, and Mr. James W. MacDonald kindly provided me an suggestion of a paired limma analysis, which i used on the code below :

library(limma)
conditions <- data.trusted.eset$condition
condition <- factor(conditions, levels(condition)[c(2,1)])
pairs <- factor(rep(1:13, each = 2))
design <- model.matrix(~condition+pairs)
fit <- lmFit(data.trusted.eset, design)
fit2 <- eBayes(fit)

In the next step i wanted to implement a multifactorial analysis in limma to evaluate the interaction between two factors, condition(control & cancer tissue) & meta_factor(metastatic & non-metastatic cancer), in order to evaluate possible DE genes between metastatic cancer and non-metastatic cancer samples:

f <- paste(data.trusted.eset$condition, data.trusted.eset$Meta_factor, sep=".")
f <- factor(f)
pairs <- factor(rep(1:13, each = 2))
design <- model.matrix(~0 +f +pairs)

fit <- lmFit(data.trusted.eset, design)
head(names$coefficients)
contrast.matrix <- makeContrasts(M.CvsC="fCancer.1-fCancer.0", M.CvsControl="fCancer.1-fNormal.0", levels=design)
fitb <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fitb)

My first question is if it is wrong for the statistical model that i didnt use an intercept term(because with the intercept term i couldnt use the coefficient "fCancer.0" for the contrast.matrix) ? And secondly, is it also important as with the first analysis with just one factor, to include the term pairs to get meaningful results ? My last question is that with topTable, i get only 4 genes with adj.p,Val < 0.05. Is that necessarily wrong or it might illustrate possible small differences between metastatic and non-metastatic tumors ?

Thank you for your patience(& please excuse me for any "begginer" questions, as i have only using R & Bioconductor for a few months).

 

limma microarray multiple factor design paired • 3.5k views
ADD COMMENT
0
Entering edit mode

any help or suggestions ?

ADD REPLY
0
Entering edit mode

What is data.trusted.eset$condition? What is data.trusted.eset$Meta_factor? We'd be able to give more help if you define it here, rather than us having to dig through an old thread to figure out what you're talking about.

ADD REPLY
0
Entering edit mode

Please excuse me, because i forgot to mention it in the start when i was refering to the variables. Data.trusted.eset is my ExpressionSet, and the two other variables are in the phenoData object as above

 

pData(data.trusted.eset) :
                                   condition         Meta_factor
7_Tesch.CEL                    Normal           0
8_Tesch.CEL                   Cancer            0
0687_04_Blauth.CEL        Normal            0
0687_04_Blauth_1.CEL      Cancer           0
0863_03_Schmidt.CEL       Normal           0
0863_03_Schmidt_1.CEL     Cancer          0
0948_04_Leiber.CEL        Normal              0
0948_04_Leiber_1.CEL      Cancer            0
1043_04_Nagel.CEL         Normal             0
1043_04_Nagel_1.CEL      Cancer             0
1103_03_Braun.CEL         Normal             1
1103_03_Braun_1.CEL      Cancer             1
1234_06_Floersch.CEL      Normal             0
1234_06_Floersch_1.CEL    Cancer           0
1235_06_Hey.CEL           Normal              0
1235_06_Hey_1.CEL         Cancer             0
1236_06_Liebich.CEL       Normal              1
1236_06_Liebich_1.CEL     Cancer           1
1410_03_Urbaniak.CEL      Normal           1
1410_03_Urbaniak_1.CEL    Cancer          1
1430_04_Patschke.CEL      Normal           0
1430_04_Patschke_1.CEL    Cancer          0
1518_03_Dege.CEL          Normal              0
1518_03_Dege_1.CEL        Cancer             0
1554_03_Gemmer.CEL        Normal            1
1554_03_Gemmer_1.CEL      Cancer          1

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

You can no longer use a paired design, because Meta_factor is a between-subject factor and paired analyses only deal with within-subject comparisons. Instead you have to treat subject (pair) as a random effect, which is done in limma using the duplicateCorrelation function.

The limma User's Guide calls your experiment a "Multi-level experiment". Just look up Section 9.7 in the User's Guide and follow the instructions.

ADD COMMENT
0
Entering edit mode

Thank you for your valuable information and direction to solve this problem. As i checked the vignette in usersguide regarding the specific topic, i would like to ask one more question that is very important: as i want to block on the patients like you mentioned above, the corresponding variable in the vignette was referred as subject. So i suppose i have to create a similar variable called pairs(or subject) like this: 

pairs <- factor(rep(1:13, each = 2)) 

and then enter it into the phenoData of my eset, which is described similarly as "targets" in the section 9.7 ??

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 22 minutes ago
The city by the bay

Your current design matrix is a bit tricky to interpret. I'd suggest doing something like this:

> f <- relevel(f, ref="Normal.0")
> design <- model.matrix(~0 + pairs + f)
> design <- design[,-ncol(design)]
> colnames(design)
 [1] "pairs1"    "pairs2"    "pairs3"    "pairs4"    "pairs5"    "pairs6"   
 [7] "pairs7"    "pairs8"    "pairs9"    "pairs10"   "pairs11"   "pairs12"  
[13] "pairs13"   "fCancer.0" "fCancer.1"

The first 13 coefficients describe the patient blocking effect. Coefficient 14 describes the fold change of Cancer.0 over Normal.0. Coefficient 15 describes the fold change of Cancer.1 over Normal.1.

Now, you can't directly compare between Cancer.1 and Cancer.0, because any differences will be absorbed into the patient blocking factors (this is also true of your original design). Instead, you can ask whether the log-fold change  between metastatic cancer and paired normal samples is equal to that between non-metastatic cancer and their normal samples, by using:

contrast.matrix <- makeContrasts(M.CvsC="fCancer.1-fCancer.0", level=design)

Of course, you can also compare between each cancer and its normal sample directly.

If you don't choose to include the pairing in design, you're going to be treating different patients as biological replicates. This will likely result in large dispersion estimates, which will confound detection of DE genes. On the other hand, it means that your dataset simplifies to a three-group design, so you can compare between metastatic and non-metastatic cancer samples directly.

As to the small number of DE genes; that depends on the variance of your data, the size of the fold-changes between normal and cancer samples for each patient, and the parametrization of your design and contrast matrices. For example, the contrast between Cancer.1 and Cancer.0 is unlikely to pick up any DE genes, as any changes will be absorbed by the blocking factors.

ADD COMMENT
0
Entering edit mode

thank you also for your comment and useful explanation in how to deal with this specific approach

ADD REPLY

Login before adding your answer.

Traffic: 719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6