Question

deseq2 strange error: high number of diffferential genes

0

Entering edit mode

jarod_v6@libero.it ▴ 40

@jarod_v6liberoit-6654

Last seen 6.2 years ago

Italy

I have 14 samples divide in two groups ( 11 vs 3 samples).

After normal deseq2 work-flow I found this results:

out of 22492 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 20748, 92%
LFC < 0 (down)   : 100, 0.44%
outliers [1]     : 0, 0%
low counts [2]   : 21, 0.093%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Whi all the genes are differential expressed...i

> head(res)
log2 fold change (MLE): Intercept
Wald test p-value: Intercept
DataFrame with 6 rows and 6 columns
                  baseMean log2FoldChange      lfcSE      stat        pvalue
                 <numeric>      <numeric> <numeric> <numeric>     <numeric>
ENSG00000000003 223.53931       7.804385 0.29525254 26.43291 5.736351e-154
ENSG00000000419 106.27997       6.731726 0.10858462 61.99520 0.000000e+00
ENSG00000000457 127.99856       6.999984 0.09604285 72.88396 0.000000e+00
ENSG00000000460   59.39814       5.892346 0.14531761 40.54805 0.000000e+00
ENSG00000000938   20.36792       4.348227 0.35805062 12.14417 6.160687e-34
ENSG00000000971 3296.12479      11.686555 0.34137287 34.23399 7.549703e-257
                         padj
                    <numeric>
ENSG00000000003 1.339792e-153
ENSG00000000419 0.000000e+00
ENSG00000000457 0.000000e+00
ENSG00000000460 0.000000e+00
ENSG00000000938 9.749256e-34
ENSG00000000971 2.170346e-256

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] pheatmap_1.0.8             ggplot2_2.2.1              genefilter_1.54.2         
 [4] biomaRt_2.28.0             DESeq2_1.12.4              SummarizedExperiment_1.2.3
 [7] Biobase_2.32.0             GenomicRanges_1.24.3       GenomeInfoDb_1.8.7        
[10] IRanges_2.6.1              S4Vectors_0.10.3           BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1       splines_3.3.2        lattice_0.20-34      colorspace_1.3-2    
 [5] htmltools_0.3.5      base64enc_0.1-3      survival_2.40-1      XML_3.98-1.5        
 [9] foreign_0.8-67       DBI_0.5-1            BiocParallel_1.6.6   RColorBrewer_1.1-2  
[13] plyr_1.8.4           stringr_1.1.0        zlibbioc_1.18.0      munsell_0.4.3       
[17] gtable_0.2.0         htmlwidgets_0.8      memoise_1.0.0        labeling_0.3        
[21] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.50.0   AnnotationDbi_1.34.4
[25] htmlTable_1.9        Rcpp_0.12.9          acepack_1.4.1        xtable_1.8-2        
[29] backports_1.0.5      scales_0.4.1         checkmate_1.8.2      limma_3.28.21       
[33] Hmisc_4.0-2          annotate_1.50.1      XVector_0.12.1       gridExtra_2.2.1     
[37] digest_0.6.12        stringi_1.1.2        grid_3.3.2           tools_3.3.2         
[41] bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0       RCurl_1.95-4.8      
[45] tibble_1.2           RSQLite_1.1-2        Formula_1.2-1        cluster_2.0.5       
[49] Matrix_1.2-8         data.table_1.10.0    assertthat_0.1       rpart_4.1-10        
[53] nnet_7.3-12

deseq2 • 1.7k views

ADD COMMENT • link updated 8.2 years ago by Michael Love 43k • written 8.2 years ago by jarod_v6@libero.it ▴ 40

0

Entering edit mode

The sample are two type of tumors. I want to understand if there is a difference on expression profile between this groups.

The sample size are 53ML of unique reads. I have some sample with low number of reads as you can see on the picture http://imgur.com/a/zFrpn .

SampleTable <-data.frame(SampleName=metadata$ID,fileName=metadata$NOMIFILE,condition=metadata$CONDITION,ID_clinico =metadata$ID_CLINICO,Corsa=metadata$Corsa)
ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="Count/", design= ~ 1) # evels: und<-droplevels(dds$condition)
dds <- estimateSizeFactors(ddHTSeq)

filterCount2<-table(dds1$condition)[2]*0.3
filterCount1<-table(dds1$condition)[1]*0.3

filterCount<-min(filterCount1,filterCount2)

if (filterCount <1.5) {
 
  filterCount = 1.5
 
 
}

idx <- rowSums( counts(dds) >= 10 ) >= filterCount

dds <- dds[idx,]
dds$condition <- relevel(dds$condition, "A")#
dds$condition
dds <- DESeq(dds)

res<-results(dds)
head(res)
summary(res)

ADD REPLY • link 8.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

How about the sequencing depth across condition. Is it confounded?

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

mean(depth[depth$condition == "A",]$Count)
[1] 4.86
>
> mean(depth[depth$condition == "B",]$Count)
[1] 9.87

summary(depth[depth$condition == "A",]$Count)
   Min. 1st Qu. Median    Mean 3rd Qu.    Max.
   7.40    7.70    8.00    9.87   11.10   14.20
> summary(depth[depth$condition != "B",]$Count)
   Min. 1st Qu. Median    Mean 3rd Qu.    Max.
   0.80    2.25    3.10    4.86    5.70   16.80

ADD REPLY • link 8.2 years ago jarod_v6@libero.it ▴ 40

score 0 · Answer 1 · 2017-02-14

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

Can you provide all your code, and tell us more about the samples. Also what about the sequencing depth across the groups. Is it equal or imbalanced?

ADD COMMENT • link 8.2 years ago Michael Love 43k

score 0 · Answer 2 · 2017-02-17

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

While the sequencing depth is confounded, which is not ideal but not the biggest problem, looking over your code I saw the real problem: your design = ~ 1

So you are testing whether the intercept = 0.

ADD COMMENT • link 8.2 years ago Michael Love 43k

0

Entering edit mode

Sorry I put the wrong part I use this for the comparison calculation I use ~1 for unsupervised analisys:

dds1<-DESeqDataSetFromMatrix(countRawCoding,SampleTable, design= ~ condition )

ADD REPLY • link 8.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

Can you confirm that when you rerun DESeq() and results() with the intended design the results are not as expected? Because the results you report are just what would be expected with ~1.

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

I check And I found what do you suggest. it was an error on the script. No I have this situation:

out of 20134 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 0, 0%
LFC < 0 (down)   : 1, 0.005%
outliers [1]     : 484, 2.4%
low counts [2]   : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Have you any suggestion?

ADD REPLY • link 8.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

To me this indicates that there are not large differences across condition (not large enough to be found at the sample size you have).

ADD REPLY • link 8.2 years ago Michael Love 43k