block design to test two factors with limma
1
1
Entering edit mode
jfertaj ▴ 30
@jfertaj-8566
Last seen 2.2 years ago
United Kingdom

Hi, I have posted a similar question few months ago applying voom + limma to a block factor design in RNA-seq experiment, but now I have a doubt if the no differences observed between cases and controls are real or caused by my design or by my misuse of block and duplicateCorrelation 

My metadata looks like this one:

   sample_ID location sex polyp_type age  status     files
1   INTP_103 proximal   1          1  31    ctrl INTP_1031
2   INTP_103   distal   1          1  31    ctrl INTP_1032
3   INTP_103   rectum   1          1  31    ctrl INTP_1033
4   INTP_105 proximal   2          3  66 disease INTP_1051
5   INTP_105   distal   2          3  66 disease INTP_1052
6   INTP_105   rectum   2          3  66 disease INTP_1053
7   INTP_106 proximal   1          2  64 disease INTP_1061
8   INTP_106   distal   1          2  64 disease INTP_1062
9   INTP_106   rectum   1          2  64 disease INTP_1063
10  INTP_111 proximal   2          1  49    ctrl INTP_1111

Basically I have samples from controls and individuals with a disease, each of the individuals (both control and diseases) have 3 samples from the same tissue but different location. I want to test differences between locations, between cases and control, and between interactions (disease status and location), here is my code

y <- DGEList(txi_longScaled$counts) # the counts come from a kallisto experiment
y <- calcNormFactors(y)
age <- metadata$age
sex <- factor(metadata$sex)
Treat <- factor(paste(metadata$status, metadata$location, sep="."))
design <- model.matrix(~0+Treat + sex + age)

design2 <- model.matrix(~1+status + sex + age)

colnames(design)[seq_len(nlevels(Treat))] <- levels(Treat)

colnames(design2) <- c("Ctrl", "Disease", "sex", "age")

# Apply voom
v <- voom(y, design)
v2 <- voom(y, design2)

#Then we estimate the correlation between measurements made on the same subject 
corfit <- duplicateCorrelation(v, design, block=metadata$sample_ID)
corfit2 <- duplicateCorrelation(v2, design2, block=metadata$sample_ID)

# Apply voom taking in account block effect ### remove correlation from the models
v <- voom(y, design, block = metadata$sample_ID, correlation = corfit$consensus)
v2 <- voom(y, design2, block = metadata$sample_ID, correlation = corfit2$consensus)

vfit <- lmFit(v, design, block = metadata$sample_ID, correlation = corfit$consensus)
vfit2 <- lmFit(v2, design2, block = metadata$sample_ID, correlation = corfit2$consensus)

## Specifigy contrasts
cm <- makeContrasts( 
          CtrlvsCasesProximal = ctrl.proximal - disease.proximal,
          CtrlvsCasesDistal = ctrl.distal - disease.distal,
          CtrlvsCasesRectum = ctrl.rectum - disease.rectum,
          Ctrl.proximalvsCtrl.rectum = ctrl.proximal - ctrl.rectum,
          Ctrl.distalvsCtrl.proximal = ctrl.distal - ctrl.proximal,
          Ctrl.distalvsCtrl.rectum = ctrl.distal - ctrl.rectum,
          Dis.proximalvsDis.distal = disease.proximal - disease.distal,
          Dis.proximalvsDis.rectum = disease.proximal - disease.rectum,
          Dis.distalvsDis.rectum = disease.distal - disease.rectum,
levels=design)

## Apply limma eBayes to voom objects
vfit <- contrasts.fit(vfit, cm)
efit <- eBayes(vfit)
efit2 <- eBayes(vfit2, robust = TRUE)

Are these designs correct? Or should I use just one single contrast and then extract the coefficients, I mean, something like this:?

design3 <- model.matrix(~0+location + status + sex + age + metadata$sample_ID)
v3 <- voom(y, design3)
vfit3 <- lmFit(v3, design3)
efit3 <- eBayes(efit3, robust = TRUE)
limma voom • 2.6k views
ADD COMMENT
0
Entering edit mode

What is Treat? Don't rely on information defined in previous posts.

ADD REPLY
0
Entering edit mode

Post updated @Aaron Lun

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 minutes ago
The city by the bay

This data set looks exactly the same as the one discussed in applying voom + limma to a block factor design in RNA-seq experiment. You should have just raised a comment there if you wanted more clarification on the parametrization that I proposed in my answer to that post.

Anyway, if you want to test the interaction terms, design is the only parametrization you can use. Moreover, if you believe that there is a non-zero interaction between location and disease status, then design is the only correct parametrization - not just for testing the interaction terms, but also for testing the effect of location and disease status, given that those effects depend on each other. design2 is inappropriate as it does not consider the effect of location at all, while design3 assumes that the effects are strictly additive.

Assuming that you are using design, the downstream code looks correct. Note that you should have filtered out low-abundance genes prior to calling calcNormFactors. It also wouldn't hurt to run eBayes with robust=TRUE for vfit.

As for why you don't have any DE genes... well, maybe that's because there are no DE genes between the disease and control conditions. The other possibility is that you don't have enough power. Try to think of some positive control genes and see how they behave. I also assume that you have many more samples than the 10 you've shown above, otherwise power will be a serious issue.

ADD COMMENT
0
Entering edit mode

Thanks Aaron Lun, yes I have filtered out low-abundance genes and I have 135 samples, although only 56 controls but I think is enough to get an idea of DEG, and last question, there is no way to test all control vs all disease samples? could I modify somehow design2 to correct for location and test all controls vs all patients?

ADD REPLY
0
Entering edit mode

You can easily test for the average of all control-related groups to the average of all disease-related groups:

con <- makeContrasts((ctrl.proximal + ctrl.distal + ctrl.rectum)/3 -
    (disease.proximal + disease.distal + disease.rectum)/3, levels=design)

... though any non-zero interaction effect will make it difficult to interpret the results of this contrast. This difficulty will be present regardless of what design you use. An alternative approach would be to do an ANODEV:

cm <- makeContrasts( 
          CtrlvsCasesProximal = ctrl.proximal - disease.proximal,
          CtrlvsCasesDistal = ctrl.distal - disease.distal,
          CtrlvsCasesRectum = ctrl.rectum - disease.rectum, levels=design)

.. with coef=NULL in topTable. Then you can figure out if DE genes are being driven by changes between case/control in one location or all locations, based on the size of the log-fold changes. A further step-up in rigour would involve testing each contrast in cm separately, and then combining the statistics with classifyTestsF, but this relies on the presence of DE genes between pairwise comparisons, and you weren't getting any of those in the first place.

ADD REPLY
0
Entering edit mode

Thanks a lot Aaron for the detailed answer!!

ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6