Question

Cook´s distance on DESeq2

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.0 years ago

Hi,

I´m using the DESseq package and trying the cook´s distance to filter genes.

boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2,main="Cook's distance")

My sample "control_1" seems to be quite different from the others ??

thanks

Cooks distance

deseq2 • 3.7k views

ADD COMMENT • link 9.2 years ago David ▴ 860

score 0 · Answer 1 · 2016-02-05

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 6 weeks ago

EMBL European Molecular Biology Laborat…

You could explore your data further to figure out what is going on:

heatmap of VST-transformed data matrix
PCA plot
heatmap of sample-sample correlation matrix
pairwise scatterplots

Is the number of sequenced or aligned reads very different for that library? Etc., ...

ADD COMMENT • link 9.2 years ago Wolfgang Huber ★ 13k

score 0 · Answer 2 · 2016-02-05

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

Can you provide your sessionInfo() as well?

ADD COMMENT • link 9.2 years ago Michael Love 43k

score 0 · Answer 3 · 2016-02-06

Hi,

I haven´t seen anything strange with that sample in my PCA, neither my clustering. (see PCA attached and hclust )(see images below).

In the meantime disease_2 has become control_6. Otherwise don´t understand the cook´s values for control 1? How should i interpret it ?

PCA

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=ca_AD.UTF-8       LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=ca_AD.UTF-8          LC_NAME=ca_AD.UTF-8
[9] LC_ADDRESS=ca_AD.UTF-8        LC_TELEPHONE=ca_AD.UTF-8
[11] LC_MEASUREMENT=ca_AD.UTF-8    LC_IDENTIFICATION=ca_AD.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] psych_1.5.8                randomForest_4.6-12
[3] genefilter_1.52.1          ggdendro_0.1-17
[5] pheatmap_1.0.8             vsn_3.38.0
[7] gridExtra_2.0.0            xlsx_0.5.7
[9] xlsxjars_0.6.1             rJava_0.9-8
[11] RColorBrewer_1.1-2         DESeq2_1.10.1
[13] RcppArmadillo_0.6.500.4.0 Rcpp_0.12.3
[15] SummarizedExperiment_1.0.2 Biobase_2.30.0
[17] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3
[19] IRanges_2.4.6              S4Vectors_0.8.11
[21] BiocGenerics_0.16.1        gplots_2.17.0
[23] reshape2_1.4.1             klaR_0.6-12
[25] MASS_7.3-45                caret_6.0-64
[27] ggplot2_2.0.0              lattice_0.20-33

loaded via a namespace (and not attached):
[1] splines_3.2.3         foreach_1.4.3         gtools_3.5.0
[4] Formula_1.2-1         affy_1.48.0           latticeExtra_0.6-26
[7] RSQLite_1.0.0         limma_3.26.7          quantreg_5.19
[10] digest_0.6.9          XVector_0.10.0        minqa_1.2.4
[13] colorspace_1.2-6      preprocessCore_1.32.0 Matrix_1.2-3
[16] plyr_1.8.3            XML_3.98-1.3          SparseM_1.7
[19] zlibbioc_1.16.0       xtable_1.8-0          scales_0.3.0
[22] gdata_2.17.0          affyio_1.40.0         BiocParallel_1.4.3
[25] lme4_1.1-10           MatrixModels_0.4-1    combinat_0.0-8
[28] annotate_1.48.0       mgcv_1.8-11           car_2.1-1
[31] nnet_7.3-12           mnormt_1.5-3          pbkrtest_0.4-6
[34] survival_2.38-3       magrittr_1.5          nlme_3.1-124
[37] foreign_0.8-66        class_7.3-14          BiocInstaller_1.20.1
[40] tools_3.2.3           stringr_1.0.0         munsell_0.4.2
[43] locfit_1.5-9.1        cluster_2.0.3         AnnotationDbi_1.32.3
[46] lambda.r_1.1.7        compiler_3.2.3        e1071_1.6-7
[49] caTools_1.17.1        futile.logger_1.4.1   grid_3.2.3
[52] nloptr_1.0.4          iterators_1.0.8       labeling_0.3
[55] bitops_1.0-6          gtable_0.1.2          codetools_0.2-14
[58] DBI_0.3.1             Hmisc_3.17-1          futile.options_1.0.0
[61] KernSmooth_2.23-15    stringi_1.0-1         geneplotter_1.48.0
[64] rpart_4.1-10          acepack_1.3-3.3

score 0 · Answer 4 · 2016-02-07

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.0 years ago

Hi Michael,

I must have done something wrong. I have regenerated all my data cleaning my code and things look much better now !!!!!

I have one question anyway. What the differences between my disease pattients and controls mean when looking at the cook´s distance ?? Do that mean that the sample effect is much higher in my disease samples ??

ADD COMMENT • link 9.2 years ago David ▴ 860

0

Entering edit mode

You can't interpret (and DESeq2 does not filter on) the Cook's distances for groups with a single sample. This is because the definition of Cook's distance is the distance the LFC for the group would move if the sample were removed.

So I wouldn't worry about the Cook's distances here. Everything looks ok.

ADD REPLY • link 9.2 years ago Michael Love 43k