DESeq2 : interactions ?
2
0
Entering edit mode
Yvan Wenger ▴ 50
@yvan-wenger-5608
Last seen 6.7 years ago
Dear list members, There must be something I missed. I have an RNA-seq timecourse experiment with triplicates (see below), there are two conditions: H5 and H8, they can be thought of control/treatment. I am trying to assess if some genes react differently over time between the two conditions. I use the code below with: full model: design = ~ time + condition + time:condition reduced model: reduced = ~ time + condition Out of 33'000 genes, I find 732 with interaction between time and condition (FDR < 0.1). Nice. The problem, some genes that I was expecting to appear in this dataset do not (see the attached pdf for example, I am only concerned with differences in H5 and H8). Furthermore, when looking at the FDR of this gene, I can see that it is very far from being significant, the FDR is 0.97 ! So I think I am doing something wrong in the way I compare the full model and the reduce model, probably I am not analyzing the data the way I want. To recapitulate, what I would like to select are different behaviours over time between the two conditions. Many thanks for your inputs ! Yvan library(DESeq2) countData <- read.table("counts_all.txt",row.names=1,header=T) countData_H58 <- countData[,31:90] ### data from table 1 to 30 are from another condition not considered here. colData_H58 <- as.data.frame(read.table(file='conditions_H58')) ### content of the conditions_H58 file pasted below dds_H58 <- DESeqDataSetFromMatrix(countData = countData_H58,colData = colData_H58, design = ~ time + condition + time:condition) ### Full model dds_H58 <- estimateSizeFactors(dds_H58) dds_H58 <- estimateDispersions(dds_H58) write.table(file="normacounts_H58.csv", counts(dds_H58,normalized=TRUE)) ddsLRT_H58 <- nbinomLRT(dds_H58, reduced = ~ time + condition) ### reduced model res_H58 <- results(ddsLRT_H58,cooksCutoff=FALSE) write.table(res_H58,file='results_H58.csv',sep="\t",quote=F) condition time 031_H5_000C H5 0 032_H5_000A H5 0 033_H5_000B H5 0 034_H5_005C H5 0.5 035_H5_005A H5 0.5 036_H5_005B H5 0.5 037_H5_010C H5 1 038_H5_010A H5 1 039_H5_010B H5 1 040_H5_020C H5 2 041_H5_020A H5 2 042_H5_020B H5 2 043_H5_040C H5 4 044_H5_040A H5 4 045_H5_040B H5 4 046_H5_080C H5 8 047_H5_080A H5 8 048_H5_080B H5 8 049_H5_160C H5 16 050_H5_160A H5 16 051_H5_160B H5 16 052_H5_240C H5 24 053_H5_240A H5 24 054_H5_240B H5 24 055_H5_360C H5 36 056_H5_360A H5 36 057_H5_360A H5 36 058_H5_480B H5 48 059_H5_480C H5 48 060_H5_480A H5 48 061_H8_000B H8 0 062_H8_000C H8 0 063_H8_000A H8 0 064_H8_005B H8 0.5 065_H8_005C H8 0.5 066_H8_005A H8 0.5 067_H8_010B H8 1 068_H8_010C H8 1 069_H8_010A H8 1 070_H8_020B H8 2 071_H8_020C H8 2 072_H8_020A H8 2 073_H8_040B H8 4 074_H8_040C H8 4 075_H8_040A H8 4 076_H8_080B H8 8 077_H8_080C H8 8 078_H8_080A H8 8 079_H8_160B H8 16 080_H8_160C H8 16 081_H8_160A H8 16 082_H8_240B H8 24 083_H8_240C H8 24 084_H8_240A H8 24 085_H8_360A H8 36 086_H8_360B H8 36 087_H8_360C H8 36 088_H8_480A H8 48 089_H8_480B H8 48 090_H8_480C H8 48 > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DESeq2_1.4.0 RcppArmadillo_0.4.100.2.1 [3] Rcpp_0.11.1 GenomicRanges_1.16.1 [5] GenomeInfoDb_1.0.2 IRanges_1.22.3 [7] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0 [4] DBI_0.2-7 genefilter_1.46.0 geneplotter_1.42.0 [7] grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1 [10] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 [16] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0 -------------- next part -------------- A non-text attachment was scrubbed... Name: seq68770.pdf Type: application/pdf Size: 6450 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140501="" ea132f21="" attachment.pdf="">
• 1.3k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States
hi Yvan, Are you specifying time as a numeric? If so, then you are not asking for genes which show any change over time specific to condition, but for genes which have consistent fold change by hour specific to condition (i.e. genes with counts which triple every hour, or halve every hour, etc.). Try instead specifying time as a factor. Mike On Thu, May 1, 2014 at 5:57 AM, Yvan Wenger <yvan.wenger at="" unige.ch=""> wrote: > Dear list members, > > There must be something I missed. I have an RNA-seq timecourse > experiment with triplicates (see below), there are two conditions: H5 > and H8, they can be thought of control/treatment. > > I am trying to assess if some genes react differently over time > between the two conditions. I use the code below with: > > full model: design = ~ time + condition + time:condition > reduced model: reduced = ~ time + condition > > Out of 33'000 genes, I find 732 with interaction between time and > condition (FDR < 0.1). Nice. > > The problem, some genes that I was expecting to appear in this dataset > do not (see the attached pdf for example, I am only concerned with > differences in H5 and H8). Furthermore, when looking at the FDR of > this gene, I can see that it is very far from being significant, the > FDR is 0.97 ! > > So I think I am doing something wrong in the way I compare the full > model and the reduce model, probably I am not analyzing the data the > way I want. To recapitulate, what I would like to select are different > behaviours over time between the two conditions. > > Many thanks for your inputs ! > > Yvan > > > > > library(DESeq2) > > countData <- read.table("counts_all.txt",row.names=1,header=T) > countData_H58 <- countData[,31:90] ### data from table 1 to 30 are > from another condition not considered here. > colData_H58 <- as.data.frame(read.table(file='conditions_H58')) ### > content of the conditions_H58 file pasted below > dds_H58 <- DESeqDataSetFromMatrix(countData = countData_H58,colData = > colData_H58, design = ~ time + condition + time:condition) ### Full > model > dds_H58 <- estimateSizeFactors(dds_H58) > dds_H58 <- estimateDispersions(dds_H58) > write.table(file="normacounts_H58.csv", counts(dds_H58,normalized=TRUE)) > ddsLRT_H58 <- nbinomLRT(dds_H58, reduced = ~ time + condition) ### reduced model > res_H58 <- results(ddsLRT_H58,cooksCutoff=FALSE) > write.table(res_H58,file='results_H58.csv',sep="\t",quote=F) > > > > condition time > 031_H5_000C H5 0 > 032_H5_000A H5 0 > 033_H5_000B H5 0 > 034_H5_005C H5 0.5 > 035_H5_005A H5 0.5 > 036_H5_005B H5 0.5 > 037_H5_010C H5 1 > 038_H5_010A H5 1 > 039_H5_010B H5 1 > 040_H5_020C H5 2 > 041_H5_020A H5 2 > 042_H5_020B H5 2 > 043_H5_040C H5 4 > 044_H5_040A H5 4 > 045_H5_040B H5 4 > 046_H5_080C H5 8 > 047_H5_080A H5 8 > 048_H5_080B H5 8 > 049_H5_160C H5 16 > 050_H5_160A H5 16 > 051_H5_160B H5 16 > 052_H5_240C H5 24 > 053_H5_240A H5 24 > 054_H5_240B H5 24 > 055_H5_360C H5 36 > 056_H5_360A H5 36 > 057_H5_360A H5 36 > 058_H5_480B H5 48 > 059_H5_480C H5 48 > 060_H5_480A H5 48 > 061_H8_000B H8 0 > 062_H8_000C H8 0 > 063_H8_000A H8 0 > 064_H8_005B H8 0.5 > 065_H8_005C H8 0.5 > 066_H8_005A H8 0.5 > 067_H8_010B H8 1 > 068_H8_010C H8 1 > 069_H8_010A H8 1 > 070_H8_020B H8 2 > 071_H8_020C H8 2 > 072_H8_020A H8 2 > 073_H8_040B H8 4 > 074_H8_040C H8 4 > 075_H8_040A H8 4 > 076_H8_080B H8 8 > 077_H8_080C H8 8 > 078_H8_080A H8 8 > 079_H8_160B H8 16 > 080_H8_160C H8 16 > 081_H8_160A H8 16 > 082_H8_240B H8 24 > 083_H8_240C H8 24 > 084_H8_240A H8 24 > 085_H8_360A H8 36 > 086_H8_360B H8 36 > 087_H8_360C H8 36 > 088_H8_480A H8 48 > 089_H8_480B H8 48 > 090_H8_480C H8 48 > > > > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] DESeq2_1.4.0 RcppArmadillo_0.4.100.2.1 > [3] Rcpp_0.11.1 GenomicRanges_1.16.1 > [5] GenomeInfoDb_1.0.2 IRanges_1.22.3 > [7] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0 > [4] DBI_0.2-7 genefilter_1.46.0 geneplotter_1.42.0 > [7] grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1 > [10] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 > [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 > [16] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States
hi Yvan, It's good to keep replies on the Bioc list, so that other users can track down answers. On Thu, May 1, 2014 at 8:21 AM, Yvan Wenger <yvan.wenger at="" unige.ch=""> wrote: > Do you know if there is a simple way to specify that time is a factor? > after the line with DESeqDataSetFromMatrix, you can do: colData(dds_H58)$time <- factor(colData(dds_H58)$time, levels=unique(colData(dds_H58)$time)) Mike > > On Thu, May 1, 2014 at 1:52 PM, Michael Love > <michaelisaiahlove at="" gmail.com=""> wrote: >> hi Yvan, >> >> Are you specifying time as a numeric? If so, then you are not asking >> for genes which show any change over time specific to condition, but >> for genes which have consistent fold change by hour specific to >> condition (i.e. genes with counts which triple every hour, or halve >> every hour, etc.). Try instead specifying time as a factor. >> >> Mike >> >> On Thu, May 1, 2014 at 5:57 AM, Yvan Wenger <yvan.wenger at="" unige.ch=""> wrote: >>> Dear list members, >>> >>> There must be something I missed. I have an RNA-seq timecourse >>> experiment with triplicates (see below), there are two conditions: H5 >>> and H8, they can be thought of control/treatment. >>> >>> I am trying to assess if some genes react differently over time >>> between the two conditions. I use the code below with: >>> >>> full model: design = ~ time + condition + time:condition >>> reduced model: reduced = ~ time + condition >>> >>> Out of 33'000 genes, I find 732 with interaction between time and >>> condition (FDR < 0.1). Nice. >>> >>> The problem, some genes that I was expecting to appear in this dataset >>> do not (see the attached pdf for example, I am only concerned with >>> differences in H5 and H8). Furthermore, when looking at the FDR of >>> this gene, I can see that it is very far from being significant, the >>> FDR is 0.97 ! >>> >>> So I think I am doing something wrong in the way I compare the full >>> model and the reduce model, probably I am not analyzing the data the >>> way I want. To recapitulate, what I would like to select are different >>> behaviours over time between the two conditions. >>> >>> Many thanks for your inputs ! >>> >>> Yvan >>> >>> >>> >>> >>> library(DESeq2) >>> >>> countData <- read.table("counts_all.txt",row.names=1,header=T) >>> countData_H58 <- countData[,31:90] ### data from table 1 to 30 are >>> from another condition not considered here. >>> colData_H58 <- as.data.frame(read.table(file='conditions_H58')) ### >>> content of the conditions_H58 file pasted below >>> dds_H58 <- DESeqDataSetFromMatrix(countData = countData_H58,colData = >>> colData_H58, design = ~ time + condition + time:condition) ### Full >>> model >>> dds_H58 <- estimateSizeFactors(dds_H58) >>> dds_H58 <- estimateDispersions(dds_H58) >>> write.table(file="normacounts_H58.csv", counts(dds_H58,normalized=TRUE)) >>> ddsLRT_H58 <- nbinomLRT(dds_H58, reduced = ~ time + condition) ### reduced model >>> res_H58 <- results(ddsLRT_H58,cooksCutoff=FALSE) >>> write.table(res_H58,file='results_H58.csv',sep="\t",quote=F) >>> >>> >>> >>> condition time >>> 031_H5_000C H5 0 >>> 032_H5_000A H5 0 >>> 033_H5_000B H5 0 >>> 034_H5_005C H5 0.5 >>> 035_H5_005A H5 0.5 >>> 036_H5_005B H5 0.5 >>> 037_H5_010C H5 1 >>> 038_H5_010A H5 1 >>> 039_H5_010B H5 1 >>> 040_H5_020C H5 2 >>> 041_H5_020A H5 2 >>> 042_H5_020B H5 2 >>> 043_H5_040C H5 4 >>> 044_H5_040A H5 4 >>> 045_H5_040B H5 4 >>> 046_H5_080C H5 8 >>> 047_H5_080A H5 8 >>> 048_H5_080B H5 8 >>> 049_H5_160C H5 16 >>> 050_H5_160A H5 16 >>> 051_H5_160B H5 16 >>> 052_H5_240C H5 24 >>> 053_H5_240A H5 24 >>> 054_H5_240B H5 24 >>> 055_H5_360C H5 36 >>> 056_H5_360A H5 36 >>> 057_H5_360A H5 36 >>> 058_H5_480B H5 48 >>> 059_H5_480C H5 48 >>> 060_H5_480A H5 48 >>> 061_H8_000B H8 0 >>> 062_H8_000C H8 0 >>> 063_H8_000A H8 0 >>> 064_H8_005B H8 0.5 >>> 065_H8_005C H8 0.5 >>> 066_H8_005A H8 0.5 >>> 067_H8_010B H8 1 >>> 068_H8_010C H8 1 >>> 069_H8_010A H8 1 >>> 070_H8_020B H8 2 >>> 071_H8_020C H8 2 >>> 072_H8_020A H8 2 >>> 073_H8_040B H8 4 >>> 074_H8_040C H8 4 >>> 075_H8_040A H8 4 >>> 076_H8_080B H8 8 >>> 077_H8_080C H8 8 >>> 078_H8_080A H8 8 >>> 079_H8_160B H8 16 >>> 080_H8_160C H8 16 >>> 081_H8_160A H8 16 >>> 082_H8_240B H8 24 >>> 083_H8_240C H8 24 >>> 084_H8_240A H8 24 >>> 085_H8_360A H8 36 >>> 086_H8_360B H8 36 >>> 087_H8_360C H8 36 >>> 088_H8_480A H8 48 >>> 089_H8_480B H8 48 >>> 090_H8_480C H8 48 >>> >>> >>> >>> >>>> sessionInfo() >>> R version 3.1.0 (2014-04-10) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] DESeq2_1.4.0 RcppArmadillo_0.4.100.2.1 >>> [3] Rcpp_0.11.1 GenomicRanges_1.16.1 >>> [5] GenomeInfoDb_1.0.2 IRanges_1.22.3 >>> [7] BiocGenerics_0.10.0 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.42.0 AnnotationDbi_1.26.0 Biobase_2.24.0 >>> [4] DBI_0.2-7 genefilter_1.46.0 geneplotter_1.42.0 >>> [7] grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1 >>> [10] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.1.0 >>> [13] stats4_3.1.0 survival_2.37-7 tools_3.1.0 >>> [16] XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 1085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6