Batch-effect: batch in design and corrected batch with sva- DESeq2
1
0
Entering edit mode
bioinfo • 0
@bioinfo-19580
Last seen 5.9 years ago

Hi All,

I did DESeq2 on my set of samples and I got only two DE genes. When I checked I found that my samples were collected at different time points and in PCA a prominent batch effect is shown. So, I included a batch in my design

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                   directory = directory,
                                   design= ~ batch + condition)

where batch is batch [1] 1 2 3 1 2 3 Levels: 1 2 3

I get 15 DE genes after including design. So far all good.

But then I read another post where we can adjust the batch effect using 'SVA' and I tried it also. See below:

       dat  <- counts(dds, normalized = TRUE)
        idx  <- rowMeans(dat) > 1
        dat  <- dat[idx, ]
        mod  <- model.matrix(~ condition, colData(dds))
        mod0 <- model.matrix(~   1, colData(dds))

        n.sv <- num.sv(dat,mod, method = 'leek')
        n.sv

     svseq <- svaseq(dat, mod, mod0, n.sv = 3)

    ddssva <- dds
    ddssva$SV1 <- svseq$sv[,1]
    ddssva$SV2 <- svseq$sv[,2]
    ddssva$SV3 <- svseq$sv[,3]
    design(ddssva) <- ~ SV1 + SV2 + SV3 + condition

 ddssva <- DESeq(ddssva) 
 resSV <- results(ddssva)

n.sv was 4 but while using 4 it throws the below error so I used 3 and it worked fine

 svseq <- svaseq(dat, mod, mod0, n.sv = 4)
    Number of significant surrogate variables is:  4 
    Iteration (out of 5 ):Error in density.default(x, adjust = adj) : 'x' contains missing values
    In addition: Warning message:
    In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced

Using this method (sva) I am getting 51 DE genes and only 5 genes are common between this and if using batch in design. My question is that why there is a difference between the results of the two methods and which method should I use form my downstream analysis.

 sessionInfo()
    R version 3.5.1 (2018-07-02)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)

    Matrix products: default

    locale:
    [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

    attached base packages:
    [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
     [1] sva_3.30.1                  genefilter_1.64.0           mgcv_1.8-24                 nlme_3.1-137               
     [5] openxlsx_4.1.0              XLConnect_0.2-15            XLConnectJars_0.2-15        xlsx_0.6.1                 
     [9] tibble_2.0.1                biomaRt_2.38.0              gplots_3.0.1                ggplot2_3.1.0              
    [13] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.5        
    [17] matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
    [21] IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        

    loaded via a namespace (and not attached):
     [1] bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.0         httr_1.4.0            
     [6] tools_3.5.1            backports_1.1.3        R6_2.3.0               rpart_4.1-13           KernSmooth_2.23-15    
    [11] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.4-0       nnet_7.3-12           
    [16] withr_2.1.2            tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2      bit_1.1-14            
    [21] curl_3.3               compiler_3.5.1         htmlTable_1.13.1       labeling_0.3           caTools_1.17.1.1      
    [26] scales_1.0.0           checkmate_1.9.1        stringr_1.3.1          digest_0.6.18          foreign_0.8-70        
    [31] XVector_0.22.0         base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6        limma_3.38.3          
    [36] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0       RSQLite_2.1.1          bindr_0.1.1           
    [41] gtools_3.8.1           zip_1.0.0              acepack_1.4.1          dplyr_0.7.8            RCurl_1.95-4.11       
    [46] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3          Matrix_1.2-14          Rcpp_1.0.0            
    [51] munsell_0.5.0          stringi_1.2.4          yaml_2.2.0             zlibbioc_1.28.0        plyr_1.8.4            
    [56] grid_3.5.1             blob_1.1.1             gdata_2.18.0           crayon_1.3.4           lattice_0.20-35       
    [61] splines_3.5.1          xlsxjars_0.6.1         annotate_1.60.0        hms_0.4.2              locfit_1.5-9.1        
    [66] knitr_1.21             pillar_1.3.1           geneplotter_1.60.0     XML_3.98-1.16          glue_1.3.0            
    [71] latticeExtra_0.6-28    data.table_1.12.0      BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5           
    [76] assertthat_0.2.0       xfun_0.4               xtable_1.8-3           survival_2.42-3        rJava_0.9-10          
    [81] AnnotationDbi_1.44.0   memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1

Any help would be appreciated! Thanks!

deseq2 svaseq • 1.9k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 5 days ago
United States

With known batches, I tend to favor ~batch + condition. I'm not sure why you would get such a small overlap.

ADD COMMENT
0
Entering edit mode

Thanks, Michael for the answer.

I was not sure why I am getting such different numbers. I would go ahead with ~ batch + condition.

Also when I use n.sv = 2 and use only two SVs I only get 4 DE genes.

Is it recommended to use high n.sv or low n.sv when running svaseq. Any suggestion?

ADD REPLY
0
Entering edit mode

I’ll let the SVA devels handle that one.

ADD REPLY

Login before adding your answer.

Traffic: 446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6