Question

is it possible and reasonable for edgeR to derive DEGs from a complicated contrast or contrast of contrast?

0

Entering edit mode

Mike • -60

@1df4ed7f

Last seen 6 months ago

United States

Hi, There:

We have RNAseq data with 4 cell lines with different knockout genotypes (WT, A, G, GA), and for each cell lines, we have two treatment types: treated (H) and untreated (C) with 5 replicates for each condition. we are interested in the treatment effect on each of these cell lines: WT.H_C, A.H_C, G.H_C, or GA.H_C (H_C mean treatment vs untreat). These are easily to be done in edgeR (see the code below). But we are also interested to see if there is difference between the treatment effects between different cell lines. for example, is there any difference between the treatment effect on cell line A (A.H_C) and the treatment effect on cell line GA (GA.H_C)? also, is there difference between the treatment effect on cell line WT (WT.H_C) and thetreatment effect on cell line G(G.H_C)? Although we can directly compare the derived DEGs of A.H_C and DEGs of GA.H_C for the 1st question, and directly compare the derived DEGs of WT.H_C and DEGs of G.H_C for the 2nd question, but I wonder if we can do this using edgeR directly and if the DEGs derived from such complicated contrast are what we are interested in terms of biological question that I asked earlier: if there is difference between the treatment effects between different cell lines, please see the code below for the contrast settings for intended complicated contrasts G_WT.H_C and GA_A.H_C:

G_WT.H_C=(H.G-C.G)-(H.WT-C.WT)

GA_A.H_C=(H.GA-C.GA)-(H.A-C.A)

any advice/suggestion would be greatly appreciated!!

Thx in advance and regards!

Mike

Code should be placed in three backticks as shown below

##the critical part of the codes are below
> show(tar5)
       Types KO TypesLines Types.KOStat
C.A.1      C  A      C.A.1          C.A
C.A.2      C  A      C.A.2          C.A
C.A.3      C  A      C.A.3          C.A
C.A.4      C  A      C.A.4          C.A
C.A.5      C  A      C.A.5          C.A
C.G.1      C  G      C.G.1          C.G
C.G.2      C  G      C.G.2          C.G
C.G.3      C  G      C.G.3          C.G
C.G.4      C  G      C.G.4          C.G
C.G.5      C  G      C.G.5          C.G
C.GA.1     C GA     C.GA.1         C.GA
C.GA.2     C GA     C.GA.2         C.GA
C.GA.3     C GA     C.GA.3         C.GA
C.GA.4     C GA     C.GA.4         C.GA
C.GA.5     C GA     C.GA.5         C.GA
C.WT.1     C WT     C.WT.1         C.WT
C.WT.2     C WT     C.WT.2         C.WT
C.WT.3     C WT     C.WT.3         C.WT
C.WT.4     C WT     C.WT.4         C.WT
C.WT.5     C WT     C.WT.5         C.WT
H.A.1      H  A      H.A.1          H.A
H.A.2      H  A      H.A.2          H.A
H.A.3      H  A      H.A.3          H.A
H.A.4      H  A      H.A.4          H.A
H.A.5      H  A      H.A.5          H.A
H.G.1      H  G      H.G.1          H.G
H.G.2      H  G      H.G.2          H.G
H.G.3      H  G      H.G.3          H.G
H.G.4      H  G      H.G.4          H.G
H.G.5      H  G      H.G.5          H.G
H.GA.1     H GA     H.GA.1         H.GA
H.GA.2     H GA     H.GA.2         H.GA
H.GA.3     H GA     H.GA.3         H.GA
H.GA.4     H GA     H.GA.4         H.GA
H.GA.5     H GA     H.GA.5         H.GA
H.WT.1     H WT     H.WT.1         H.WT
H.WT.2     H WT     H.WT.2         H.WT
H.WT.3     H WT     H.WT.3         H.WT
H.WT.4     H WT     H.WT.4         H.WT
H.WT.5     H WT     H.WT.5         H.WT

> show(rawdata[1:5,1:10]);
                            SYMBOL                 ENSID C.A.1 C.A.2 C.A.3
ENSMUSG00000000001.4_Gnai3   Gnai3  ENSMUSG00000000001.4  6227  5210  5183
ENSMUSG00000000003.15_Pbsn    Pbsn ENSMUSG00000000003.15     0     0     0
ENSMUSG00000000028.15_Cdc45  Cdc45 ENSMUSG00000000028.15   655   541   571
ENSMUSG00000000031.16_H19      H19 ENSMUSG00000000031.16   607  1074   792
ENSMUSG00000000037.16_Scml2  Scml2 ENSMUSG00000000037.16    74    62    45
                            C.A.4 C.A.5 C.G.1 C.G.2 C.G.3
ENSMUSG00000000001.4_Gnai3   5187 13509  4944  4885  5650
ENSMUSG00000000003.15_Pbsn      0     0     0     0     0
ENSMUSG00000000028.15_Cdc45   538  1402   287   305   236
ENSMUSG00000000031.16_H19    1459  3263  3757  9408  9656
ENSMUSG00000000037.16_Scml2    72   164    77    72    50
> library(edgeR)
> Group <- factor(tar5$Types.KOStat)
> y <- DGEList(counts=rawdata[,3:ncol(rawdata)], genes=rawdata[,1:2], group=Group)
>keep <- filterByExpr(y,group=Group)
>y <- y[keep, , keep.lib.sizes=FALSE]
>y <- calcNormFactors(y)
> design <- model.matrix(~0+Group)
> colnames(design) <- levels(Group)
> rownames(design) <- colnames(y)
>y <- estimateDisp(y,design, robust=TRUE)
> fit <- glmQLFit(y,design,robust=TRUE);
>my.contrasts <- makeContrasts(
+ ##basic contrast
+ GA.H_C=H.GA-C.GA, A.H_C=H.A-C.A,
+ G.H_C=H.G-C.G,WT.H_C=H.WT-C.WT,
+ ###complicated contrasts
+ G_WT.H_C=(H.G-C.G)-(H.WT-C.WT),
+ GA_A.H_C=(H.GA-C.GA)-(H.A-C.A),
+  levels=design)

for(i in 1:length(colnames(my.contrasts))){
+ currContrast<-colnames(my.contrasts)[i];
+ qlf <- glmQLFTest(fit, contrast=my.contrasts[,currContrast])
+ AllTags<-topTags(qlf, n=Inf);
+ res<-as.data.frame(AllTags);
}


sessionInfo( )
> show(sessionInfo( ));
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] org.Mm.eg.db_3.12.0  AnnotationDbi_1.52.0 IRanges_2.26.0      
 [4] S4Vectors_0.30.0     Biobase_2.50.0       BiocGenerics_0.38.0 
 [7] pheatmap_1.0.12      edgeR_3.32.1         limma_3.46.0        
[10] reshape_0.8.9        reshape2_1.4.4       ggrepel_0.9.1       
[13] ggplot2_3.3.6       

loaded via a namespace (and not attached):
 [1] statmod_1.4.36     tidyselect_1.1.2   locfit_1.5-9.4     purrr_0.3.4       
 [5] splines_4.0.2      lattice_0.20-45    colorspace_2.0-3   vctrs_0.4.1       
 [9] generics_0.1.2     utf8_1.2.2         blob_1.2.3         rlang_1.0.2       
[13] pillar_1.7.0       glue_1.6.2         withr_2.5.0        DBI_1.1.2         
[17] bit64_4.0.5        RColorBrewer_1.1-3 lifecycle_1.0.1    plyr_1.8.7        
[21] stringr_1.4.0      munsell_0.5.0      gtable_0.3.0       memoise_2.0.1     
[25] fastmap_1.1.0      fansi_1.0.3        Rcpp_1.0.8.3       scales_1.2.0      
[29] cachem_1.0.6       bit_4.0.4          stringi_1.7.6      dplyr_1.0.9       
[33] grid_4.0.2         cli_3.3.0          tools_4.0.2        magrittr_2.0.3    
[37] tibble_3.1.7       RSQLite_2.2.13     crayon_1.5.1       pkgconfig_2.0.3   
[41] ellipsis_0.3.2     assertthat_0.2.1   R6_2.5.1           compiler_4.0.2

complicated.contrast edgeR DEGs • 1.0k views

ADD COMMENT • link 2.3 years ago Mike • -60

score 0 · Answer 1 · 2022-09-05

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 58 minutes ago

WEHI, Melbourne, Australia

The "complicated contrasts" you have defined are called interactions. G_WT.H_C is the interaction between treatment and genotype (for WT vs C). GA_A.H.C is the interaction between treatment and genotype (for GA vs C). The interaction does indeed test whether the treatment effect is the same for the two genotypes.

The use of interaction contrasts is a standard procedure in edgeR and in statistical practice more generally.

ADD COMMENT • link 2.3 years ago Gordon Smyth 52k

0

Entering edit mode

Thx a lot for the input, Dr. Gordon Smyth! Yes, it is interaction contrast that we are interested. I am asking if my setting is correct or not. we did see your paper (URL: https://f1000research.com/articles/5-1438) with an example at section of "Complicated contrasts":

con <- makeContrasts((L.lactating-L.pregnant)-(B.lactating-B.pregnant), levels=design)

which is very similar to what I did here. But since one of the contrasts we got almost just 0 DEGs at FDR<0.05 and FC>1.5 in one of the complicated contrasts, although we expected there shall be some DEGs here, which made me wonder if anything is wrong with my model or contrast setting, or my code has issue, and so Just want confirmation from expertise' eyes. Or instead of using combined string as "Group" here (Group <- factor(tar5$Types.KOStat)), I shall use syntax like design <- model.matrix(~0+Types:KOStat)? Your advice would be really helpful! Thx, Mike

ADD REPLY • link 2.3 years ago Mike • -60

0

Entering edit mode

Yes, the settings in your question seem correct and I think I already answered your original question.

On the other hand, the additional setting you mention in your comment is not what I recommend. I strongly, strongly advise against using FC cutoffs for assessing statistical significance. I understand that it is very common practice in the literature, but it is very bad practice. The F1000 article you link to says:

"A commonly used approach is to apply FDR and logFC cutoffs simultaneously. However this tends to favor lowly expressed genes, and also fails to control the FDR correctly."

The only possible motivation for using additional criteria on top of edgeR's FDR to select DE genes would be if you have too many DE genes to interpret (and in that case we recommend treat) but applying a FC cutoff and throwing away the DE genes that you wanted doesn't make any sense to me.

Even without the FC cutoff, it is more difficult to achieve statistical significance for contrasts than for simple comparisons because interactions involve more terms and hence the standard errors are larger. There's nothing that can be done about that. Testing more complex questions is just intrinsically more difficult. Changing the design matrix cannot possibly help.

ADD REPLY • link 2.3 years ago Gordon Smyth 52k

0

Entering edit mode

Thx a lot for the info and confirmation! also thx a lot for your input and detailed info against using FC cutoff as part of criteria for DEGs. Indeed, I used to use FDR alone, but my wet lab colaborators preferred and insisted on with FC cutoff even at 1.5 on basis of their experience that derived DEGs with FC cutoff (1.5) seems relatively easy to be validated from their experience. Also in some contrasts, we do have lot of DEGs, and having additional FC cutoffs indeed help reducing the size of DEGs as you mentioned as one possile motivation. sometimes in practice, it is hard to say which way is better indeed especillay on validation. Thx again for your great input and info, very helpful!! regards, Mike.

ADD REPLY • link 2.3 years ago Mike • -60