I am rather new to Bioconductor and RNA-Seq analysis and tried to go over several papers and workflow to follow DESeq2 pipeline. Apologies if this post will be quite long.
I have a 2x2 factorial reproductive-stage drought stress experiment in rice. I have 2 genotypes (NIL-tolerant, Swarna-sensitive) and 2 conditions (Drought and Control). I have 4 biological replicates. We sampled two different tissues; flag leaf and panicle at maximum booting. I'm trying to analyze the data per tissue first.
I used the tximport package to import my transcript-level dataset using the following codes:
tx.all <- tximport(myfiles, type = "salmon", txOut = TRUE, reader = read_tsv)
I created the DESeqDataset for use with DESeq2 as follows:
colData <- data.frame(geno=rep(c("NIL","Swarna"),each=8),condition=rep(rep(c("Control","Drought"),each=4),times=2))
rownames(colData) <- colnames(tx.all$counts)
dds <- DESeqDataSetFromTximport(tx.all, colData, formula(~geno+condition+geno:condition))
I grouped them:
dds$group<-factor(paste0(dds$geno, dds$condition))
design(dds) <- ~group
I did pre-filtering to filter out samples that have no counts, or only a single count across all samples.
nrow(dds)
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)
I ran the DESeq2 and generated the multiple condition effects.
dds<-DESeq(dds)
> resultsNames(dds)
[1] "Intercept" "groupNILControl" "groupNILDrought" "groupSwarnaControl"
[5] "groupSwarnaDrought"
Since we're interested in identifying gene/s in NIL that confers drought tolerance at reproductive growth stage of rice. I set-up the result using contrast. The following is an example:
res<-results(dds, contrast=c("group","NILDrought","SwarnaDrought"))
res
mcols(res, use.names=TRUE)
summary(res)
I got this nice table:
out of 49722 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 2538, 5.1%
LFC < 0 (down) : 2292, 4.6%
outliers [1] : 892, 1.8%
low counts [2] : 10260, 21%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
One thing I noticed is that even though i did not specified the log fold change, it does calculate for all the transcripts.
Then, I did multiple testing as follows:
resLFC1 <- results(dds, contrast=c("group","NILDrought","SwarnaDrought"), lfcThreshold = 1, alpha = 0.05)
table(resLFC1$padj < 0.05)
summary(resLFC1)
sum( resLFC1$padj < 0.05, na.rm=TRUE )
And the results:
out of 49722 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 166, 0.33%
LFC < 0 (down) : 99, 0.2%
outliers [1] : 892, 1.8%
low counts [2] : 4819, 9.7%
(mean count < 0)
I extracted all the genes that were differentially expressed based from this result which brings me to my question:
First, i observed that the log fold change for the default (res) generation and from the multiple testing (resLFC1) is the same as well as the base means and the lfcSE. But the padj is TOTALLY DIFFERENT!? I may totally not get it but why the padj value is different from res and resLFC1 after the addition of lfcThreshold=1 and lowering the padj to 0.05?
This is an example from res table:
FeatureID | baseMean | log2FoldChange | lfcSE | stat | pvalue | padj |
LOC_Os01g02700.1 | 300.8324 | 2.433062 | 0.384283 | 6.331434 | 2.43E-10 | 2.52E-08 |
This same gene is identified in resLFC1;
FeatureID | baseMean | log2FoldChange | lfcSE | stat | pvalue | padj |
LOC_Os01g02700.1 | 300.8324 | 2.433062 | 0.384283 | 3.729184 | 0.000192 | 0.023987 |
I know DESeq2 uses the so-called Benjamini-Hochberg (BH) adjustment that calculates for each gene and adjusted p-value but please and kindly enlighten me what's happening during the addition of lfcThreshold and lowering the padj to 0.05 in the DESeq2 package? Why are the padj values differs but the log2FoldChange remains the same?
Which then brings me to my other question. Since I used the extracted genes from res to generate Venn diagram for our four defined contrasts using an example subsetting below;
#Set up thresholds#
Upregulated = 1
Downregulated = -1
NILD_NILC_upregulated <- subset(NILD_NILC, (log2FoldChange > Upregulated & padj < 0.05))
I'm getting a different number of upregulated or downregulated genes based on these parameters for Venn diagram than that generated from resLFC1. Well, that is because the padj is different from res and resLFC1.
Should I still use those genes generated from res for generation of Venn diagram?
Please and kindly advise. Any comments on the workflow that I'm following is much appreciated.
Thanks and best regards.
Asher
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] readr_1.1.0 tximport_1.2.0 genefilter_1.56.0
[4] pheatmap_1.0.8 RColorBrewer_1.1-2 ggplot2_2.2.1
[7] gplots_3.0.1 DESeq2_1.14.1 SummarizedExperiment_1.4.0
[10] Biobase_2.34.0 GenomicRanges_1.26.4 GenomeInfoDb_1.10.3
[13] IRanges_2.8.1 S4Vectors_0.12.1 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 locfit_1.5-9.1 lattice_0.20-34 gtools_3.5.0
[5] assertthat_0.1 digest_0.6.12 R6_2.2.0 plyr_1.8.4
[9] backports_1.0.5 acepack_1.4.1 RSQLite_1.1-2 zlibbioc_1.20.0
[13] lazyeval_0.2.0 data.table_1.10.4 annotate_1.52.1 gdata_2.17.0
[17] rpart_4.1-10 Matrix_1.2-7.1 checkmate_1.8.2 splines_3.3.2
[21] BiocParallel_1.8.1 geneplotter_1.52.0 stringr_1.2.0 foreign_0.8-67
[25] htmlwidgets_0.8 RCurl_1.95-4.8 munsell_0.4.3 base64enc_0.1-3
[29] htmltools_0.3.5 nnet_7.3-12 tibble_1.2 gridExtra_2.2.1
[33] htmlTable_1.9 Hmisc_4.0-2 XML_3.98-1.5 bitops_1.0-6
[37] grid_3.3.2 xtable_1.8-2 gtable_0.2.0 DBI_0.6
[41] magrittr_1.5 scales_0.4.1 KernSmooth_2.23-15 stringi_1.1.3
[45] XVector_0.14.1 latticeExtra_0.6-28 Formula_1.2-1 tools_3.3.2
[49] hms_0.3 survival_2.39-5 AnnotationDbi_1.36.2 colorspace_1.3-2
[53] cluster_2.0.5 caTools_1.17.1 memoise_1.0.0 knitr_1.15.1