Question

code syntax for DESeq2 to derive DEGs from a complicated contrast or interaction contrast?

0

Entering edit mode

Mike • -60

@1df4ed7f

Last seen 6 months ago

United States

We have RNAseq data with 4 cell lines with different knockout genotypes (WT, A, G, GA), and for each cell lines, we have two treatment types: treated (H) and untreated (C) with 5 replicates for each condition. we are interested in the treatment effect on each of these cell lines: WT.H_C, A.H_C, G.H_C, or GA.H_C (H_C means treatment vs untreat comparison). These are easily to be done in DESeq2 based on user guide (see the code below, such as:

myConL<-list();

k<-1;

myConL[[k]]<-c("Types.KOStat","H.GA", "C.GA");

names(myConL)[k]<-"GA.H_C";

...

But we are also interested to see if there is difference between the treatment effects between different cell lines. for example, is there any difference between the treatment effect on cell line A (A.H_C) and the treatment effect on cell line GA (GA.H_C)? also, is there any difference between the treatment effect on cell line G (G.H_C) and the treatment effect on cell line WT(WT.H_C)?It is indeed the interaction between the lines and treatment, which I annotated as contrasts: GA_A.H_C and G_WT.H_C respectively. But from DESeq2 user guide, did not see example code for this level of complicated contrast, not sure how this can be done in syntax although in theory it shall be able to be done. I did try the following way as shown in the code section as well:

k<-k+1;

myConL[[k]]<-c(0,0,-1,0,-1,0,1,0);

names(myConL)[k]<-"GA_A.H_C";

k<-k+1;

myConL[[k]]<-c(0,-1,0,1,0,1,0,-1);

names(myConL)[k]<-"G_WT.H_C";

this is based on the output as below and also other related posts:

show(resultsNames(dds)) # lists the coefficients

[1] "Intercept" "Types.KOStat_C.G_vs_C.A"

[3] "Types.KOStat_C.GA_vs_C.A" "Types.KOStat_C.WT_vs_C.A"

[5] "Types.KOStat_H.A_vs_C.A" "Types.KOStat_H.G_vs_C.A"

[7] "Types.KOStat_H.GA_vs_C.A" "Types.KOStat_H.WT_vs_C.A"

want to confirm with you if this is correct way to be done in DESeq2, or you have more simple way to do it? any advice/suggestion would be greatly appreciated!

Thx in advance and regards!

Mike

Code should be placed in three backticks as shown below

##the critical part of the codes are below
> show(tar5)
       Types KO TypesLines Types.KOStat
C.A.1      C  A      C.A.1          C.A
C.A.2      C  A      C.A.2          C.A
C.A.3      C  A      C.A.3          C.A
C.A.4      C  A      C.A.4          C.A
C.A.5      C  A      C.A.5          C.A
C.G.1      C  G      C.G.1          C.G
C.G.2      C  G      C.G.2          C.G
C.G.3      C  G      C.G.3          C.G
C.G.4      C  G      C.G.4          C.G
C.G.5      C  G      C.G.5          C.G
C.GA.1     C GA     C.GA.1         C.GA
C.GA.2     C GA     C.GA.2         C.GA
C.GA.3     C GA     C.GA.3         C.GA
C.GA.4     C GA     C.GA.4         C.GA
C.GA.5     C GA     C.GA.5         C.GA
C.WT.1     C WT     C.WT.1         C.WT
C.WT.2     C WT     C.WT.2         C.WT
C.WT.3     C WT     C.WT.3         C.WT
C.WT.4     C WT     C.WT.4         C.WT
C.WT.5     C WT     C.WT.5         C.WT
H.A.1      H  A      H.A.1          H.A
H.A.2      H  A      H.A.2          H.A
H.A.3      H  A      H.A.3          H.A
H.A.4      H  A      H.A.4          H.A
H.A.5      H  A      H.A.5          H.A
H.G.1      H  G      H.G.1          H.G
H.G.2      H  G      H.G.2          H.G
H.G.3      H  G      H.G.3          H.G
H.G.4      H  G      H.G.4          H.G
H.G.5      H  G      H.G.5          H.G
H.GA.1     H GA     H.GA.1         H.GA
H.GA.2     H GA     H.GA.2         H.GA
H.GA.3     H GA     H.GA.3         H.GA
H.GA.4     H GA     H.GA.4         H.GA
H.GA.5     H GA     H.GA.5         H.GA
H.WT.1     H WT     H.WT.1         H.WT
H.WT.2     H WT     H.WT.2         H.WT
H.WT.3     H WT     H.WT.3         H.WT
H.WT.4     H WT     H.WT.4         H.WT
H.WT.5     H WT     H.WT.5         H.WT

> show(mydata5[1:5,1:8]);
                            C.A.1 C.A.2 C.A.3 C.A.4 C.A.5 C.G.1 C.G.2 C.G.3
ENSMUSG00000000001.4_Gnai3   6227  5210  5183  5187 13509  4944  4885  5650
ENSMUSG00000000003.15_Pbsn      0     0     0     0     0     0     0     0
ENSMUSG00000000028.15_Cdc45   655   541   571   538  1402   287   305   236
ENSMUSG00000000031.16_H19     607  1074   792  1459  3263  3757  9408  9656
ENSMUSG00000000037.16_Scml2    74    62    45    72   164    77    72    50

> library("DESeq2")
> dds <- DESeqDataSetFromMatrix(countData = mydata5,
+                               colData = tar5,
+                               design = ~Types.KOStat)

> dds$Types.KOStat <- relevel(dds$Types.KOStat, ref = "C.A")
> dds<- DESeq(dds)
> show(resultsNames(dds)) # lists the coefficients
[1] "Intercept"                "Types.KOStat_C.G_vs_C.A" 
[3] "Types.KOStat_C.GA_vs_C.A" "Types.KOStat_C.WT_vs_C.A"
[5] "Types.KOStat_H.A_vs_C.A"  "Types.KOStat_H.G_vs_C.A" 
[7] "Types.KOStat_H.GA_vs_C.A" "Types.KOStat_H.WT_vs_C.A"

> myConL<-list();
> k<-1;
> myConL[[k]]<-c("Types.KOStat","H.GA", "C.GA");
> names(myConL)[k]<-"GA.H_C";
> k<-k+1;
> myConL[[k]]<-c("Types.KOStat","H.A", "C.A");
> names(myConL)[k]<-"A.H_C";
> k<-k+1;
> myConL[[k]]<-c("Types.KOStat","H.G", "C.G");
> names(myConL)[k]<-"G.H_C";
> k<-k+1;
> myConL[[k]]<-c("Types.KOStat","H.WT", "C.WT");
> names(myConL)[k]<-"WT.H_C";

> k<-k+1;
> myConL[[k]]<-c(0,0,-1,0,-1,0,1,0);
> names(myConL)[k]<-"GA_A.H_C";
> k<-k+1;
> myConL[[k]]<-c(0,-1,0,1,0,1,0,-1);
> names(myConL)[k]<-"G_WT.H_C";

> for(i in 1:length(myConL)){       
+resLFC <- lfcShrink(dds,  contrast = myConL[[i]], type="ashr")
+ resLFC<-as.data.frame(resLFC[order(resLFC$pvalue),]);

+ ###LFC shrink results#######################################################33
+ NAG<-resLFC[is.na(resLFC$padj),]
+ resLFC<-resLFC[!(rownames(resLFC) %in% rownames(NAG)),];
+ resRpt<-resLFC;
+ resRpt<-data.frame(SYMBOL=sapply(strsplit(rownames(resRpt), split="\\_"), function(x) x[2]),
+ ENSID=sapply(strsplit(rownames(resRpt), split="\\_"), function(x) x[1]), IDs=rownames(resRpt),resRpt,stringsAsFactors = FALSE);
+ resRpt2<-resRpt[resRpt$padj<=0.05 & resRpt$log2FoldChange!=0,];
+ filename1<-paste("DESeq2_",names(myConL)[i],"_adjP0_05LFC.csv",sep="")
+ filename1<-paste(outPath,filename1,sep="/")
+ write.table(resRpt2, file=filename1,sep=",", eol="\r\n",quote = FALSE, row.names=FALSE);
+}




# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] pheatmap_1.0.12             DESeq2_1.30.1              
 [3] SummarizedExperiment_1.22.0 Biobase_2.50.0             
 [5] MatrixGenerics_1.2.1        matrixStats_0.62.0         
 [7] GenomicRanges_1.42.0        GenomeInfoDb_1.28.1        
 [9] IRanges_2.26.0              S4Vectors_0.30.0           
[11] BiocGenerics_0.38.0         limma_3.46.0               
[13] reshape_0.8.9               reshape2_1.4.4             
[15] ggrepel_0.9.1               ggplot2_3.3.6              

loaded via a namespace (and not attached):
 [1] httr_1.4.2             bit64_4.0.5            splines_4.0.2         
 [4] assertthat_0.2.1       mixsqp_0.3-43          blob_1.2.3            
 [7] GenomeInfoDbData_1.2.4 pillar_1.7.0           RSQLite_2.2.13        
[10] lattice_0.20-45        glue_1.6.2             RColorBrewer_1.1-3    
[13] XVector_0.30.0         colorspace_2.0-3       Matrix_1.4-1          
[16] plyr_1.8.7             XML_3.99-0.9           pkgconfig_2.0.3       
[19] invgamma_1.1           genefilter_1.72.1      zlibbioc_1.36.0       
[22] purrr_0.3.4            xtable_1.8-4           scales_1.2.0          
[25] BiocParallel_1.24.1    tibble_3.1.7           annotate_1.68.0       
[28] generics_0.1.2         ellipsis_0.3.2         cachem_1.0.6          
[31] withr_2.5.0            ashr_2.2-54            cli_3.3.0             
[34] survival_3.3-1         magrittr_2.0.3         crayon_1.5.1          
[37] memoise_2.0.1          fansi_1.0.3            truncnorm_1.0-8       
[40] tools_4.0.2            lifecycle_1.0.1        stringr_1.4.0         
[43] munsell_0.5.0          locfit_1.5-9.4         DelayedArray_0.18.0   
[46] irlba_2.3.5            AnnotationDbi_1.52.0   compiler_4.0.2        
[49] rlang_1.0.2            grid_4.0.2             RCurl_1.98-1.6        
[52] bitops_1.0-7           gtable_0.3.0           DBI_1.1.2             
[55] R6_2.5.1               dplyr_1.0.9            fastmap_1.1.0         
[58] bit_4.0.4              utf8_1.2.2             stringi_1.7.6         
[61] SQUAREM_2021.1         Rcpp_1.0.8.3           vctrs_0.4.1           
[64] geneplotter_1.68.0     tidyselect_1.1.2

DESeq2 Complicated.contrast DEGs • 822 views

ADD COMMENT • link updated 2.3 years ago by Michael Love 43k • written 2.3 years ago by Mike • -60

score 0 · Answer 1 · 2022-09-06

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 2 days ago

San Diego

is there any difference between the treatment effect on cell line G (G.H_C) and the treatment effect on cell line WT(WT.H_C)?

I'm not sure why the interaction section isn't what you want.

ADD COMMENT • link 2.3 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thx for asking. Indeed, this is not what I am asking. I was wondering the code syntax for interaction contrast in DESeq2 and just want to see expertise' way of doing it in DESeq2 and if the way I did is correct or not. Thx Mike

ADD REPLY • link 2.3 years ago Mike • -60

0

Entering edit mode

I unfortunately don't have time to go over user analysis scripts, I instead recommend users settle on experimental design and analysis choices by working with a local statistician or someone familiar with linear models in R.

ADD REPLY • link 2.3 years ago Michael Love 43k