Question

Different DESeq2 results (both FC and pvalue) by changing col/row orders in input

0

Entering edit mode

Weisheng • 0

@177e01d3

Last seen 18 months ago

United States

I'm getting different p-values and FCs from DESeq2 by simply changing the order of the columns in the count table. I can't understand why. I made sure the cols of count table match the rows of colData in both.

The differences with betaPrior=T are small ([-0.003, 0.003] for p-value, and [-0.002, 0.001] for FC). But if I do betaPrior=F and then use apeglm in lfcShrink to shrink the FC, the difference of log2FC can be up to 0.8 (and the baseMean for that gene is 300), which is concerning.

The difference of log2FoldChange before shrinking is much smaller: [-4.5e-05, 4.5e05]

Column Genotype in colData is the only tested variable in the model. It contains WT and Mutant as factors, with WT as the 1st level.

Actually if I change column Genotype to character type and rerun DESeq2 (so DESeq2 will convert them to factors with the warning message), I get the same results independent of the col/row orders in the input. So it seems the difference is somehow related to the factorization of the columns in colData, but I don't know why, and which results I should trust.

The input data (data.rds) can be downloaded here: https://github.com/weishwu/test_data/blob/main/data.rds?raw=true


data <- readRDS('data.rds')

dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ Genotype)
dds1 <- dds1[rowSums(counts(dds1)) > 1, ]
dds1 <- DESeq(dds1, betaPrior = FALSE)
res1 <- as.data.frame(lfcShrink(dds1, coef='Genotype_Mutant_vs_WT', type='apeglm'))

# only reorder the inputs, but the columns of counts still match the rows of colData:
dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),], design = ~ Genotype)
dds2 <- dds2[rowSums(counts(dds2)) > 1, ]
dds2 <- DESeq(dds2, betaPrior = FALSE)
res2 <- as.data.frame(lfcShrink(dds2, coef='Genotype_Mutant_vs_WT', type='apeglm'))

# difference of log2FC after shrinking
range(res1$log2FoldChange-res2$log2FoldChange)

# difference of log2FC before shrinking
range(results(dds1)$log2FoldChange-results(dds2)$log2FoldChange)

sessionInfo( )

[1] -0.82441013  0.01714274

[1] -4.515867e-05  4.509163e-05

R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/WAT_diffex/lib/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] forcats_0.5.1               stringr_1.4.0              
 [3] dplyr_1.0.9                 purrr_0.3.4                
 [5] readr_2.1.2                 tidyr_1.2.0                
 [7] tibble_3.1.7                ggplot2_3.3.6              
 [9] tidyverse_1.3.1             DESeq2_1.34.0              
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[13] MatrixGenerics_1.6.0        matrixStats_0.62.0         
[15] GenomicRanges_1.46.1        GenomeInfoDb_1.30.0        
[17] IRanges_2.28.0              S4Vectors_0.32.3           
[19] BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               lubridate_1.8.0       
 [4] bit64_4.0.5            RColorBrewer_1.1-3     httr_1.4.3            
 [7] numDeriv_2016.8-1.1    tools_4.1.3            backports_1.4.1       
[10] utf8_1.2.2             R6_2.5.1               DBI_1.1.2             
[13] colorspace_2.0-3       apeglm_1.16.0          withr_2.5.0           
[16] tidyselect_1.1.2       bit_4.0.4              compiler_4.1.3        
[19] cli_3.3.0              rvest_1.0.2            xml2_1.3.3            
[22] DelayedArray_0.20.0    scales_1.2.0           mvtnorm_1.1-3         
[25] genefilter_1.76.0      XVector_0.34.0         pkgconfig_2.0.3       
[28] bbmle_1.0.25           dbplyr_2.2.0           fastmap_1.1.0         
[31] rlang_1.0.2            readxl_1.4.0           rstudioapi_0.13       
[34] RSQLite_2.2.8          generics_0.1.2         jsonlite_1.8.0        
[37] BiocParallel_1.28.3    RCurl_1.98-1.7         magrittr_2.0.3        
[40] GenomeInfoDbData_1.2.7 Matrix_1.4-1           Rcpp_1.0.8.3          
[43] munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.1       
[46] stringi_1.7.6          MASS_7.3-57            zlibbioc_1.40.0       
[49] plyr_1.8.7             grid_4.1.3             blob_1.2.3            
[52] parallel_4.1.3         bdsmatrix_1.3-6        crayon_1.5.1          
[55] lattice_0.20-45        Biostrings_2.62.0      haven_2.5.0           
[58] splines_4.1.3          annotate_1.72.0        hms_1.1.1             
[61] KEGGREST_1.34.0        locfit_1.5-9.5         pillar_1.7.0          
[64] geneplotter_1.72.0     reprex_2.0.1           XML_3.99-0.9          
[67] glue_1.6.2             modelr_0.1.8           png_0.1-7             
[70] vctrs_0.4.1            tzdb_0.3.0             cellranger_1.1.0      
[73] gtable_0.3.0           assertthat_0.2.1       cachem_1.0.6          
[76] emdbook_1.3.12         xtable_1.8-4           broom_0.8.0           
[79] coda_0.19-4            survival_3.3-1         AnnotationDbi_1.56.1  
[82] memoise_2.0.1          ellipsis_0.3.2

DESeq2 • 2.6k views

ADD COMMENT • link 2.4 years ago Weisheng • 0

score 0 · Answer 1 · 2022-06-20

0

Entering edit mode

ATpoint ★ 4.5k

@atpoint-13662

Last seen 1 day ago

Germany

I cannot reproduce this, see below example. How is it that the second dds is even created? Running data[['colData']][c(4:6,1:3),] on a DataFrame with a single column would collapse it to a vector unless drop=FALSE as I do below. Maybe that messes up the factor levels somehow? Not sure what is going on on your end, please try to make a reproducible example using the makeExampleDESeqDataSet() function as below.


suppressMessages(library(DESeq2))

set.seed(1)
dds <- makeExampleDESeqDataSet(m=6)
data <- list()
data[["counts"]] <- counts(dds)
data[["colData"]] <- colData(dds) 

dds1 <- DESeqDataSetFromMatrix(countData = data[['counts']], colData = data[['colData']], design = ~ condition)
dds1 <- dds1[rowSums(counts(dds1)) > 1, ]
dds1 <- DESeq(dds1, betaPrior = TRUE)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res1 <- results(dds1)["gene1",]

# only reorder the inputs, but the columns of counts still match the rows of colData:
dds2 <- DESeqDataSetFromMatrix(countData = data[['counts']][,c(4:6,1:3)], colData = data[['colData']][c(4:6,1:3),,drop=FALSE], design = ~ condition)
dds2 <- dds2[rowSums(counts(dds2)) > 1, ]
dds2 <- DESeq(dds2, betaPrior = TRUE)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res2 <- results(dds2)["gene1",]

data.frame(res1)
#>       baseMean log2FoldChange     lfcSE      stat    pvalue      padj
#> gene1 5.833548      0.1324449 0.4228758 0.3132004 0.7541284 0.9914275
data.frame(res2)
#>       baseMean log2FoldChange     lfcSE      stat    pvalue      padj
#> gene1 5.833548      0.1324449 0.4228758 0.3132004 0.7541284 0.9914275
Created on 2022-06-20 by the reprex package (v2.0.1)

ADD COMMENT • link 2.4 years ago ATpoint ★ 4.5k

0

Entering edit mode

ATpoint Thanks a lot for your reply. I cannot reproduce the problem using makeExampleDESeqDataSet either. However, if I replace the count table with my real data, I got the differences again.

Also, if you look at the difference of log2FoldChange between dds1 and dds2 even in the data produced by makeExampleDESeqDataSet, you will still see a tiny bit difference. The difference is on the scale of 1e-15 when n=1000 (default) in makeExampleDESeqDataSet, but goes up to 1e-9 when n=50000 (the number that is closer to real data). Somehow this difference gets much larger (1e-3) with my input data.

Would it be possible that I send you a count table?

By the way, I have other columns in addition to Genotype in my colData, so I didn't have to use drop=F. But I only used Genotype in the model.

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

I am not the DESeq2 maintainer, just a guy who answers questions to improve myself. But yes, please feel free to upload a reproducible example so the developer or others can have a look.

ADD REPLY • link 2.4 years ago ATpoint ★ 4.5k

0

Entering edit mode

I edited my post to include the link to my input data (de-identified). I also changed the code to use betaPrior=F and then use lfcShrink(type='apeglm') to shrink the FC. This produced up to 0.8 difference for the shrunken log2FC between dds1 and dds2. Thanks for your time.

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

I get only a small numerical difference on shrunken with your code and data:

!> range(abs(res1$log2FoldChange-res2$log2FoldChange))
 [1] 7.772374e-18 2.052735e-06

!> sessionInfo()
 R version 4.1.2 (2021-11-01)
 Platform: x86_64-apple-darwin17.0 (64-bit)
 Running under: macOS Big Sur 10.16

 Matrix products: default
 BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
 LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] stats4    stats     graphics  grDevices datasets  utils     methods   base

 other attached packages:
  [1] DESeq2_1.34.0               SummarizedExperiment_1.24.0 Biobase_2.54.0
  [4] MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.1
  [7] GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.3
 [10] BiocGenerics_0.40.0

ADD REPLY • link 2.4 years ago Michael Love 42k

0

Entering edit mode

Michael Love Thanks a lot for helping me troubleshoot. I can see that we are both using version 1.34.0, so I'm really puzzled where the discrepancy could come from. I reinstalled DESeq2 and apeglm using conda: https://anaconda.org/bioconda/bioconductor-deseq2 https://anaconda.org/bioconda/bioconductor-apeglm And ran my code with the same input, and got the same results as I got before:

> range(abs(res1$log2FoldChange-res2$log2FoldChange))
[1] 5.289453e-11 8.244101e-01

This is so weird.

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

What does the delta LFC ~0.8 gene look like in terms of its counts? Aside from the fact that we get different numbers, which I don't understand, can you just remove genes like this (maybe mostly 0 counts) at the top of your script?

ADD REPLY • link 2.4 years ago Michael Love 42k

0

Entering edit mode

It has normal counts:

> counts(dds1)[3635,]
sample_1 sample_2 sample_3 sample_4 sample_5 sample_6 
     438      400      287      174      254      200 

> counts(dds2)[3635,]
sample_4 sample_5 sample_6 sample_1 sample_2 sample_3 
     174      254      200      438      400      287 

> res1[3635,]
          baseMean log2FoldChange     lfcSE       pvalue        padj
gene_4218 300.7056     -0.8556051 0.2740356 6.026021e-05 0.005373745

> res2[3635,]
          baseMean log2FoldChange      lfcSE       pvalue        padj
gene_4218 300.7056    -0.03119498 0.06075569 6.036421e-05 0.005383019

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

I'm also using the same apeglm. No idea.

FWIW I get:

 > res1[3635,]
           baseMean log2FoldChange     lfcSE       pvalue        padj
 gene_4218 300.7056     -0.8556051 0.2740356 6.026021e-05 0.005373745
 > res2[3635,]
           baseMean log2FoldChange     lfcSE       pvalue        padj
 gene_4218 300.7056     -0.8556051 0.2740356 6.026021e-05 0.005373745

 loaded via a namespace (and not attached):
  [1] bdsmatrix_1.3-4        Rcpp_1.0.8             locfit_1.5-9.4         mvtnorm_1.1-3
  [5] apeglm_1.16.0          lattice_0.20-45        png_0.1-7              Biostrings_2.62.0
  [9] assertthat_0.2.1       utf8_1.2.2             plyr_1.8.6             R6_2.5.1
 [13] emdbook_1.3.12         coda_0.19-4            RSQLite_2.2.10         httr_1.4.2
 [17] ggplot2_3.3.5.9000     pillar_1.7.0           zlibbioc_1.40.0        rlang_1.0.1
 [21] annotate_1.72.0        blob_1.2.2             Matrix_1.4-0           bbmle_1.0.24
 [25] splines_4.1.2          BiocParallel_1.28.3    geneplotter_1.72.0     RCurl_1.98-1.6
 [29] bit_4.0.4              munsell_0.5.0          DelayedArray_0.20.0    numDeriv_2016.8-1.1
 [33] compiler_4.1.2         pkgconfig_2.0.3        tidyselect_1.1.2       KEGGREST_1.34.0
 [37] tibble_3.1.6           GenomeInfoDbData_1.2.7 XML_3.99-0.9           fansi_1.0.2
 [41] crayon_1.5.0           dplyr_1.0.8            MASS_7.3-55            bitops_1.0-7
 [45] grid_4.1.2             xtable_1.8-4           gtable_0.3.0           lifecycle_1.0.1
 [49] DBI_1.1.2              magrittr_2.0.2         scales_1.1.1           cli_3.2.0
 [53] cachem_1.0.6           XVector_0.34.0         genefilter_1.76.0      ellipsis_0.3.2
 [57] vctrs_0.3.8            generics_0.1.2         RColorBrewer_1.1-2     tools_4.1.2
 [61] bit64_4.0.5            glue_1.6.2             purrr_0.3.4            parallel_4.1.2
 [65] fastmap_1.1.0          survival_3.2-13        AnnotationDbi_1.56.2   colorspace_2.0-3
 [69] memoise_2.0.1

ADD REPLY • link 2.4 years ago Michael Love 42k

0

Entering edit mode

If it's not too troublesome, I wonder if you could make a docker/singularity image file containing your DESeq2 (if it's not from the same conda source) and share it with me?

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

I'm a little limited on bandwidth right now, so don't think I can put an image together. I did just run on a Linux machine and also I get low differences in LFC from convergence:

> range(abs(res1$log2FoldChange-res2$log2FoldChange))
[1] 3.204902e-16 1.081247e-04

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.5 (Ootpa)

Matrix products: default
BLAS/LAPACK: /nas/longleaf/rhel8/apps/r/4.1.0/lib/libopenblas_haswellp-r0.3.5.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] DESeq2_1.34.0               SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [4] MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.1       
 [7] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.3           
[10] BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               usethis_2.1.5         
 [4] devtools_2.4.3         bit64_4.0.5            RColorBrewer_1.1-2    
 [7] httr_1.4.2             rprojroot_2.0.2        numDeriv_2016.8-1.1   
[10] tools_4.1.0            utf8_1.2.2             R6_2.5.1              
[13] DBI_1.1.2              colorspace_2.0-2       apeglm_1.16.0         
[16] withr_2.4.3            tidyselect_1.1.1       prettyunits_1.1.1     
[19] processx_3.5.2         bit_4.0.4              compiler_4.1.0        
[22] cli_3.1.1              desc_1.4.0             DelayedArray_0.20.0   
[25] scales_1.1.1           mvtnorm_1.1-3          genefilter_1.76.0     
[28] callr_3.7.0            XVector_0.34.0         pkgconfig_2.0.3       
[31] sessioninfo_1.2.2      fastmap_1.1.0          bbmle_1.0.24          
[34] rlang_0.4.12           rstudioapi_0.13        RSQLite_2.2.9         
[37] generics_0.1.1         BiocParallel_1.28.3    dplyr_1.0.7           
[40] RCurl_1.98-1.5         magrittr_2.0.1         GenomeInfoDbData_1.2.7
[43] Matrix_1.4-0           Rcpp_1.0.8             munsell_0.5.0         
[46] fansi_1.0.2            lifecycle_1.0.1        MASS_7.3-55           
[49] zlibbioc_1.40.0        brio_1.1.3             plyr_1.8.6            
[52] pkgbuild_1.3.1         grid_4.1.0             blob_1.2.2            
[55] parallel_4.1.0         bdsmatrix_1.3-4        crayon_1.4.2          
[58] lattice_0.20-45        Biostrings_2.62.0      splines_4.1.0         
[61] annotate_1.72.0        KEGGREST_1.34.0        locfit_1.5-9.4        
[64] ps_1.6.0               pillar_1.6.4           geneplotter_1.72.0    
[67] pkgload_1.2.4          XML_3.99-0.8           glue_1.6.1            
[70] remotes_2.4.2          png_0.1-7              vctrs_0.3.8           
[73] testthat_3.1.2         gtable_0.3.0           purrr_0.3.4           
[76] assertthat_0.2.1       cachem_1.0.6           ggplot2_3.3.5.9000    
[79] emdbook_1.3.12         xtable_1.8-4           coda_0.19-4           
[82] survival_3.2-13        tibble_3.1.6           AnnotationDbi_1.56.2  
[85] memoise_2.0.1          ellipsis_0.3.2

ADD REPLY • link 2.4 years ago Michael Love 42k

0

Entering edit mode

Why do the results differ in different OS?

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

Could you post the content of

mcols(dds1)[,3635]
mcols(dds2)[,3635]

to get more info on that gene? Maybe the OS dependency comes from different BLAS/LAPACK versions?

ADD REPLY • link 2.4 years ago Frederik Ziebell ▴ 30

0

Entering edit mode

> mcols(dds1)[3635,]
DataFrame with 1 row and 22 columns
           baseMean   baseVar   allZero dispGeneEst dispGeneIter   dispFit
          <numeric> <numeric> <logical>   <numeric>    <numeric> <numeric>
gene_4218   300.706   19279.8     FALSE   0.0723587            8 0.0171378
          dispersion  dispIter dispOutlier   dispMAP Intercept
           <numeric> <integer>   <logical> <numeric> <numeric>
gene_4218  0.0411997         7       FALSE 0.0411997   8.64614
          Genotype_Mutant_vs_WT SE_Intercept SE_Genotype_Mutant_vs_WT
                      <numeric>    <numeric>                <numeric>
gene_4218              -1.00134     0.174373                   0.2496
          WaldStatistic_Intercept WaldStatistic_Genotype_Mutant_vs_WT
                        <numeric>                           <numeric>
gene_4218                 49.5841                            -4.01179
          WaldPvalue_Intercept WaldPvalue_Genotype_Mutant_vs_WT  betaConv
                     <numeric>                        <numeric> <logical>
gene_4218                    0                      6.02602e-05      TRUE
           betaIter  deviance  maxCooks
          <numeric> <numeric> <numeric>
gene_4218         3   66.7957  0.211976
> mcols(dds2)[3635,]
DataFrame with 1 row and 22 columns
           baseMean   baseVar   allZero dispGeneEst dispGeneIter   dispFit
          <numeric> <numeric> <logical>   <numeric>    <numeric> <numeric>
gene_4218   300.706   19279.8     FALSE   0.0723587            8 0.0171356
          dispersion  dispIter dispOutlier   dispMAP Intercept
           <numeric> <integer>   <logical> <numeric> <numeric>
gene_4218  0.0412088         7       FALSE 0.0412088   8.64614
          Genotype_Mutant_vs_WT SE_Intercept SE_Genotype_Mutant_vs_WT
                      <numeric>    <numeric>                <numeric>
gene_4218              -1.00134     0.174391                 0.249626
          WaldStatistic_Intercept WaldStatistic_Genotype_Mutant_vs_WT
                        <numeric>                           <numeric>
gene_4218                 49.5789                            -4.01138
          WaldPvalue_Intercept WaldPvalue_Genotype_Mutant_vs_WT  betaConv
                     <numeric>                        <numeric> <logical>
gene_4218                    0                      6.03642e-05      TRUE
           betaIter  deviance  maxCooks
          <numeric> <numeric> <numeric>
gene_4218         3   66.7955  0.211976

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

I tried the above (with betaPrior FALSE and apeglm) on macOS, two different Linux machines (Ubuntu and CentOS), both inside and outside of Docker (Ubuntu), and via conda, and the maximum differences I get are in the range of somewhatish in the range of e-8 - e-13. I also tried disabling the implicit BLAS multithreading that is often default on Linux machines RhpcBLASctl::blas_set_num_threads(1) (so one thread=disabled) which makes no difference, nor makes settings seeds before executing all steps. Can this be some kind of machine-precision thing is specific to your machine?

ADD REPLY • link 2.4 years ago ATpoint ★ 4.5k

0

Entering edit mode

Thanks a lot for spending time trying it in so many ways. I also tried it on a local PC and got a 1e-8 scale difference. It seems I can only reproduce the issue on our server (RedHat 7.9 3.10.0-1160.62.1.el7.x86_64). No matter which version of DESeq2 (1.26, 1.34 or 1.36) or which source (conda/docker) I tried, I got the same up to 0.8 difference as long as I ran it on our server. I had one of my colleagues run the same code with the same input and she observed basically the same pattern. How this can be specific to the machine/OS is mysterious.

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

apeglm uses optim_lbfgs from RcppNumerical for optimization, and that could give different convergence on different backends, but I'm still surprised that it seems limited to this one setup.

Can you try something else, can you remove low count genes before running DESeq():

keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]

Just to see if it has to do with sensitivity to these genes (the prior should not be sensitive to these as we've constructed it...).

ADD REPLY • link 2.4 years ago Michael Love 42k

0

Entering edit mode

I added the filtering as you suggested, and the differences became smaller (both before and after shrinking using apeglm):

# difference of log2FC after shrinking
range(res1$log2FoldChange-res2$log2FoldChange)
[1] -0.012467845  0.004255587

# difference of log2FC before shrinking
range(results(dds1)$log2FoldChange-results(dds2)$log2FoldChange)
[1] -9.367712e-07  1.244219e-08

ADD REPLY • link 2.4 years ago Weisheng • 0

0

Entering edit mode

I would recommend this step. Ideally it wouldn't be necessary, but anyway it's a common first step for many statistical routines to remove the very low counts that have no power.

ADD REPLY • link 2.4 years ago Michael Love 42k

0

Entering edit mode

Thanks. I'm incorporating this filter in all my comparisons. Thanks for your help!

ADD REPLY • link 2.4 years ago Weisheng • 0