Dear all,
I used DESeq2
to perform a DE analysis on mRNA and, when extracting the contrasts I found that one of them has strange values on log2FoldChange
.
Examples of this values are:
baseMean log2FoldChange lfcSE stat pvalue padj control.norm control.norm.sd case.norm case.norm.sd control.raw control.raw.sd case.raw case.raw.sd entrezgene ensembl_gene_id hgnc_symbol 0.451583505545774 48.8957821441089 3.06977055428295 15.9281553065552 4.0410373668378e-57 9.75021495870625e-53 0 0 1.5176109951821 1.52380894399315 0 0 1.33333333333333 1.211060141639 26863 ENSG00000206737 RNVU1-18 0.228079869975656 40.318409610448 3.40710380743093 11.8336311099513 2.61576228336841e-32 3.15565561865565e-28 0 0 0.331882123975104 0.541343418415764 0 0 0.333333333333333 0.516397779494322 149620 ENSG00000203878 CHIAP2 0.295106535209405 -43.035323935704 3.76355502327969 -11.4347534895881 2.80331740594844e-30 2.2546147456908e-26 0.149214083505201 0.394783357063187 0 0 0.285714285714286 0.755928946018454 0 0 400696 ENSG00000226025 0.345263598177344 -31.756499075573 3.03938372161554 -10.4483349205717 1.49120227027424e-25 8.99493209429424e-22 0.822645190253885 0.878232249766697 0 0 0.714285714285714 0.755928946018454 0 0 2169 ENSG00000145384 FABP2 0.212013349024225 35.5888959495018 3.51136109756479 10.1353563363744 3.84967405244237e-24 1.85769871074659e-20 0 0 0.390411030164701 0.604849718887933 0 0 0.333333333333333 0.516397779494322 100422864 ENSG00000265981 MIR544B 0.257772697263073 -42.1836002941776 4.19204965429255 -10.0627625560167 8.07014201858527e-24 3.24527311040709e-20 0.308565055995893 0.550841610282403 0 0 0.285714285714286 0.487950036474267 0 0 100129027 ENSG00000205424
While other contrast is as expected:
baseMean log2FoldChange lfcSE stat pvalue padj control.norm control.norm.sd case.norm case.norm.sd control.raw control.raw.sd case.raw case.raw.sd entrezgene ensembl_gene_id hgnc_symbol
2483.71966540303 1.32485131527691 0.162630879037524 8.14637000745246 3.75011481830193e-16 6.7940830163176e-12 1413.81550878397 276.670790081588 3606.37780200197 1133.20438162628 1565.14285714286 594.397293783495 3475.5 1384.34255153845 9221 ENSG00000166197 NOLC1
591.896723229054 0.941463488371378 0.121775692478411 7.7311281850299 1.06597617839736e-14 9.65614521201244e-11 386.342696553717 28.4571984287309 745.386206238067 189.435252833034 418.428571428571 104.716851784319 730 315.487559184194 8882 ENSG00000109917 ZPR1
73.0950010099952 3.22858351861627 0.431964208272553 7.47419220570967 7.76795340911659e-14 4.69106706376551e-10 15.6194970329938 7.57155853894966 172.907613085039 155.016375713218 17.4285714285714 10.0806273425342 154.833333333333 133.553609710358 374 ENSG00000109321 AREG
341.11782393837 1.7595707621299 0.240333215700595 7.32137984756112 2.45434021151137e-13 1.11163204029879e-09 141.413691168911 24.3840654277537 534.422275484452 371.071571595906 154.428571428571 50.1459773822293 491.5 321.692244233522 8793 ENSG00000173530 TNFRSF10D
58.9101837146771 1.68002413777924 0.231785040329491 7.24819917364394 4.22349239481266e-13 1.53034023433642e-09 29.0758522070694 4.29691808349143 89.2536631356094 22.0780236896495 31.5714285714286 8.77225062070054 85.5 27.5154502052938 84541 ENSG00000163376 KBTBD8
The process I used to generate those tables follows:
desc$f1 <- relevel(desc$f1, "24.0")
dds.skin <- DESeqDataSetFromMatrix(
countData = counts,
colData = desc,
design = ~ plate_batch + flowcell + id + f1
)
dds.skin <- DESeq(dds.skin, betaPrior = FALSE)
save(dds.skin, file="dds_t24.rda")
## EXTRACT CONTRAST FOR EACH COMPARISON
res <- results(dds.skin, contrast=c("f1", "24.3", "24.0"), pAdjustMethod = "fdr")
res <- res[order(res$padj), ]
export(res, desc, counts, norm, "24.3", "24.0", "f1", "genes_24_0_3")
res <- results(dds.skin, contrast=c("f1", "24.6", "24.0"), pAdjustMethod = "fdr")
res <- res[order(res$padj), ]
export(res, desc, counts, norm, "24.6", "24.0", "f1", "genes_24_0_6")
res <- results(dds.skin, contrast=c("f1", "24.6", "24.3"), pAdjustMethod = "fdr")
res <- res[order(res$padj), ]
export(res, desc, counts, norm, "24.6", "24.3", "f1", "genes_24_3_6")
Being the table genes_24_3_6
the one with strange logFC values and both genes_24_0_3
and genes_24_0_6
with expected ones. So, are those values at logFC correct or I did something wrong?
More info.: The variable f1
has the values 6.0
, 24.0
, 6.3
, 24.3
, 6.6
and 24.6
.
R> sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: CentOS release 6.6 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.24.1 DESeq2_1.8.2 RcppArmadillo_0.6.100.0.0 Rcpp_0.12.1 [5] GenomicRanges_1.20.8 GenomeInfoDb_1.4.3 IRanges_2.2.7 S4Vectors_0.6.6 [9] BiocGenerics_0.14.0 BiocInstaller_1.18.4 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 XVector_0.8.0 bitops_1.0-6 futile.options_1.0.0 [7] tools_3.2.0 rpart_4.1-10 digest_0.6.8 annotate_1.46.1 lattice_0.20-33 RSQLite_1.0.0 [13] gtable_0.1.2 DBI_0.3.1 proto_0.3-10 gridExtra_2.0.0 genefilter_1.50.0 cluster_2.0.3 [19] stringr_1.0.0 locfit_1.5-9.1 nnet_7.3-11 grid_3.2.0 Biobase_2.28.0 AnnotationDbi_1.30.1 [25] XML_3.98-1.3 survival_2.38-3 BiocParallel_1.2.21 foreign_0.8-66 latticeExtra_0.6-26 Formula_1.2-1 [31] geneplotter_1.46.0 ggplot2_1.0.1 reshape2_1.4.1 lambda.r_1.1.7 magrittr_1.5 splines_3.2.0 [37] scales_0.3.0 Hmisc_3.17-0 MASS_7.3-44 xtable_1.7-4 colorspace_1.2-6 stringi_0.5-5 [43] acepack_1.3-3.3 RCurl_1.95-4.7 munsell_0.4.2