edgeR and batch effects
1
0
Entering edit mode
@chatterjeearumoy-23195
Last seen 4.3 years ago

Hi,

I am trying to perform differential gene expression analysis with edgeR. I have four conditions T1, T2, T3 and T4 and two batches. I am running the following codes:

Count1=read.table("GENESFORANALYSIS.txt", sep="\t", header=TRUE)

 colnames(Count1) = paste(c("Genename", "T1R1","T1R2","T1R3",  
                                                                      "T2R1","T2R2","T2R3", 
                                                                      "T3R1","T3R2","T3R3", 
                                                                      "T4R1","T4R2","T4R3"))

 conditions = c(rep("T1", 3), rep("T2", 3), rep("T3", 3), rep("T4",3))

 batch=c("A", "B", "A", 
    "A", "A", "B",
    "A", "A", "B",
    "A", "A", "B")

   y=DGEList(counts=Count1[,-1], genes=Count1[,1],group=conditions)

   cpm=10/(min(y$samples$lib.size)/10^6)
   keep = rowSums(cpm(y) >= 1.105529) >= 3
   y = y[keep,,keep.lib.sizes=FALSE]

  design=model.matrix(~ batch + conditions, data=y$samples)
  rownames(design) = colnames(y)

    z = calcNormFactors(y)
    z = estimateGLMCommonDisp(z, design, verbose=TRUE)
    z = estimateGLMTrendedDisp(z, design)
    z = estimateGLMTagwiseDisp(z, design)

   fit1 = glmFit(z, design)
    lrt1 = glmLRT(fit1, coef=3)
     tt=topTags(lrt1, n=nrow(z))
       head(tt$table)
         summary(decideTests(lrt1))

My model matrix is: (1)

 design=model.matrix(~ batch + conditions, data=y$samples)
           (Intercept) batchB conditionsT2 conditionsT3 conditionsT4
 T1R1           1      0            0            0            0
 T1R2           1      1            0            0            0
 T1R3           1      0            0            0            0
 T2R1           1      0            1            0            0
 T2R2           1      0            1            0            0
 T2R3           1      1            1            0            0
 T3R1           1      0            0            1            0
 T3R2           1      0            0            1            0
 T3R3           1      1            0            1            0
 T4R1           1      0            0            0            1
 T4R2           1      0            0            0            1
 T4R3           1      1            0            0            1
attr(,"assign")
[1] 0 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"

attr(,"contrasts")$conditions
 [1] "contr.treatment"

(2)

design=model.matrix(~ 0+ batch + conditions, data=y$samples)
     batchA batchB conditionsT2 conditionsT3 conditionsT4
T1R1      1      0            0            0            0
T1R2      0      1            0            0            0
T1R3      1      0            0            0            0
T2R1      1      0            1            0            0
T2R2      1      0            1            0            0
T2R3      0      1            1            0            0
T3R1      1      0            0            1            0
T3R2      1      0            0            1            0
T3R3      0      1            0            1            0
T4R1      1      0            0            0            1
T4R2      1      0            0            0            1
T4R3      0      1            0            0            1
attr(,"assign")
[1] 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"

attr(,"contrasts")$conditions
[1] "contr.treatment"

Now, what I need to know. 1) I need to know how to compare T1 vs T2, T1vs T3, T1vs T4, T2vsT3, T2vs T4, T3vs T4 using the commands (a) contrast= c(0, 0, 0, 0, 0) as well as coef=3/4/5/3:4/etc. 2) I need to know if my models (with or without intercepts) will include the batch effects while giving me the names of the differentially expressed genes?

I would like someone to kindly reply to me. I have checked the edgeR manual and not able to understand it. Thanks a lot. Arumoy.

edger • 1.1k views
ADD COMMENT
1
Entering edit mode

It would be easiest to write the design as 0 + conditions + batch so switching position of batch and condition. With this design you can make all your desired contrasts while properly accounting for the batch. The edgeR maintainers will probably add their expert opinions on that, but isn't the QLF framework the preferred pipeline these days rather than the LRT test?

ADD REPLY
5
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States

Both models (with and without the intercept) account for batch, but setting up the proper contrasts will be more straightforward using the no-intercept model.

To make our lives easier, put your group of interest up front and the batch last ie.

design <- model.matrix(~ 0 + conditions + batch, data = y$samples)

Let's model our analysis on the edgeR quasi-likelihood workflow, and it will look something like this:

library(edgeR)
# ... data munging ...
y.all <- calcNormFactors(DGEList(...))
design <- model.matrix(~ 0+ conditions + batch, data=y.all$samples)

# remove lowly expressed genes
y <- calcNormFactors(y.all[filterByExpr(y.all, design),,keep.lib.sizes = FALSE])
y <- estimateDisp(y, design, robust=TRUE)

contrasts <- makeContrasts(
  T1vsT2 = conditionsT1 - conditionsT2,
  T1vsT3 = conditionsT1 - conditionsT3,
  # ...
  T3vsT4 = conditionsT3 - conditionsT4,
  levels = design)

fit <- glmQLFit(y, design, robust=TRUE)

# Here is **one way** to get list of dge statistics for each contrast 
# we have defined.
results <- sapply(colnames(contrasts), function(cname) {
  res <- glmQLFTest(fit, contrast = contrasts[, cname])
  topTags(res, n = Inf, sort.by = "none")
}, simplify = FALSE)
ADD COMMENT
0
Entering edit mode

Thanks a lot Steve. Arumoy

ADD REPLY
0
Entering edit mode

If the answer solve the problem please upvote and accept it (the checkmark symbol on the left).

ADD REPLY

Login before adding your answer.

Traffic: 506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6