Apply different cutoffs with edgeR pipeline and glmTreat for RNA-Seq DEG analysis
1
0
Entering edit mode
svlachavas ▴ 830
@svlachavas-7225
Last seen 13 months ago
Germany/Heidelberg/German Cancer Resear…


Dear Bioconductor community,

based on a recent analysis of an RNA-Seq gene expression analysis concerning different cancer cell lines, i was trying to apply the relative Treat methodology for various biological contrasts of interest, in order to narrow my DEGlists, as significant differences have already be identified in the relative MDS plots and EDA analysis. Based on the downstream code chunk:

condition <- as.factor(dge$samples$Biological_Condition)
batch <- as.character(dge$samples$Cell_Line)
condition
[1] Differentiated.Q Differentiated.W Stem.W           Stem.P         
[5] Stem.W           Stem.P           Differentiated.P Differentiated.Q
[9] Stem.Q           Stem.Q           Differentiated.W Differentiated.P
6 Levels: Differentiated.P Differentiated.Q Differentiated.W Stem.P ... Stem.W
batch
[1] "RADH87" "RADH87" "RNS87"  "RNS85"  "RNS85"  "RNS87"  "RADH87" "RADH85"
[9] "RNS87"  "RNS85"  "RADH85" "RADH85"
design <- model.matrix(~0 +condition + batch)
head(design)
  conditionDifferentiated.P conditionDifferentiated.Q conditionDifferentiated.W
1                         0                         1                         0
2                         0                         0                         1
3                         0                         0                         0
4                         0                         0                         0
5                         0                         0                         0
6                         0                         0                         0
  conditionStem.P conditionStem.Q conditionStem.W batchRADH87 batchRNS85
1               0               0               0           1          0
2               0               0               0           1          0
3               0               0               1           0          0
4               1               0               0           0          1
5               0               0               1           0          1
6               1               0               0           0          0
  batchRNS87
1          0
2          0
3          1
4          0
5          0
6          1

design1 <- design[,-grep("batchRNS87", colnames(design))]

y2 <- estimateDisp(dge, design1, robust=TRUE)
fit <- glmQLFit(y2, design1, robust=TRUE)
Q.D.S <- makeContrasts(conditionDifferentiated.Q -conditionStem.Q, levels=design1)

W.D.S <- makeContrasts(conditionDifferentiated.W  - conditionStem.W,levels=design1)

P.D.S <- makeContrasts(conditionDifferentiated.P- conditionStem.P, levels=design1)

tr <- glmTreat(fit, contrast=Q.D.S, lfc=log2(1.5))

de.QDS <- topTags(tr,n=nrow(tr),p.value=0.05)

dat.QDS <- de.QDS$table
dim(dat.QDS)
[1] 5032    5 # initial number of DEGs

# second cutoff with glmTreat
tr <- glmTreat(fit, contrast=Q.D.S, lfc=1.5)

de.QDS <- topTags(tr,n=nrow(tr),p.value=0.001)

dat.QDS <- de.QDS$table
dim(dat.QDS)
[1] 1516    5 # reduced number

Overall, as with the initial lfc threshold, with and adjusted p-value, the number of DEGs is still high, could the final subsequent cutoffs-both on the lfc of Treat and relative p-value-could be used to narrow the DElist ? in order also to be able to perform pathway analysis ?

Thank you in advance,

Efstathios

 

edger differential gene expression glmtreat • 1.4k views
ADD COMMENT
1
Entering edit mode

I am guessing that there is typo in your code, that the second cutoff should have lfc=log2(1.5) instead of lfc=1.5. Setting lfc=1.5 corresponds to a fold change of 2.83 = 2^1.5, which would be a bit strange and probably not what you intended.

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 20 minutes ago
WEHI, Melbourne, Australia

The batch correction and design matrix in your analysis are not correct. The three contrasts that you have tested for are all between "Differentiated" and "Stem" conditions. But the Differentiated and Stem data were collected from non-overlapping batches, as can be seen from the following table:

> table(condition,batch)
                  batch
condition          RADH85 RADH87 RNS85 RNS87
  Differentiated.P      1      1     0     0
  Differentiated.Q      1      1     0     0
  Differentiated.W      1      1     0     0
  Stem.P                0      0     1     1
  Stem.Q                0      0     1     1
  Stem.W                0      0     1     1

So you cannot test for DE between Differentiated and Stem conditions if you wish to adjust for batch effects.

I see that you have tried to "fix" the above problem by dropping one of the columns of the design matrix, but that is not correct -- it will give results that don't have any meaning.

ADD COMMENT
0
Entering edit mode

Dear Gordon, even after a long time for a weird reason no notification came for your answer !! Even though I realized this after some time of inspecting the results, thanks a million for this important note !!

ADD REPLY

Login before adding your answer.

Traffic: 674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6