DESeq2 Time course analysis design - only time points WITHOUT treatments
1
1
Entering edit mode
@maithe-barros-13840
Last seen 4.1 years ago

Hello,

I am doing RNA-seq analysis of differential gene expression from mare's endometrial biopsies. I do not have any treatments though, and this is the reason I am having a hard time trying to figure out the design I should use as all examples I found so far had treatments + time points.

I have the first time point of 0h which is my control (endometrial biopsy from a mare) then 24h, 48h and 72h. All I want to do is to understand what happens over the time points, without any treatment. I have 8 animals and for each animal I have the 4 timepoints. My metadata has "sample_id" column with my bam files,"timepoint" column and "horse_id" column.

My coding looks like this:

txdb <- makeTxDbFromGFF("Equus_caballus.EquCab2.89.gtf", format="gtf")
ebg <- exonsBy(txdb, by="gene")
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=TRUE,
                        fragments=TRUE )
dds <- DESeqDataSet(se, design = ~ timepoint)
dds <- DESeq(dds)
res_0hand72h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "control_0h", "timepoint_72h"))
res_0hand24h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "control_0h", "timepoint_24h"))
res_24hand72h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "timepoint_24h", "timepoint_72h"))
res_48hand72h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "timepoint_48h", "timepoint_72h"))
res_24hand48h_LFC1 <- results(dds, lfcThreshold = 1, contrast=c("timepoint", "timepoint_24h", "timepoint_48h"))
table(res_0hand72h_LFC1$padj < 0.1)
table(res_0hand24h_LFC1$padj < 0.1)
table(res_24hand72h_LFC1$padj < 0.1)
table(res_48hand72h_LFC1$padj < 0.1)
table(res_24hand48h_LFC1$padj < 0.1)

Running the above I got results comparing two time points at a time (false rate discovery set for 0.1% with a >2 fold change). Results look ok to me. E.g. When comparing control (0h) to 72h time point a total of 22542 genes were expressed, 824 of which were shown to be statistically significantly differentially up expressed and 785 shown to be statistically significantly differentially down expressed, both at a false discovery rate (FDR) of 0.1% with a fold change greater or equal to 2.

BUT I wanted to have one result comparing all time points, but I am not sure if that's doable/reasonable?

first doubt about my design:

I am not sure whether I need to add my horse_id to the design, maybe
dds <- DESeqDataSet(se, design = ~ treatment + horse_id)
or
dds <- DESeqDataSet(se, design = ~ treatment + horse_id + treatment:horse_id) 
I do not know whether or not to use the interaction between treatment and horse_id.

second doubt about the time series analysis:

Should I use nbimLRT() function to test the significance of multiple coefficients at once?

I tried:

ddsLRT <- DESeqDataSet(se, design = ~ treatment*horse_id)
design(ddsLRT) <- formula(~ treatment*horse_id)
ddsLRT <- estimateSizeFactors(ddsLRT)
ddsLRT <- estimateDispersions(ddsLRT)
ddsLRT <- nbinomLRT(ddsLRT, reduced = formula(~ treatment))
resLRT <- results(ddsLRT)
table(resLRT$padj < 0.1)

and the result was FALSE 22542 genes, meaning that none of the expressed genes were statistically significantly expressed, which is a results different from the pairwise analysis of time points separatedely. 

I hope I made myself clear and I really look forward to receiving any help/thoughts on this.

Many thanks,

Maithe Barros

rna-seq deseq2 multiple time points • 5.7k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 5 days ago
United States

I'd recommend a LRT with a full design of ~horse + time and a reduced design of ~horse.

A couple notes here: I like to use short variable names without "_", this will reduce errors potentially (e.g. R doesn't allow "_" in column names, in case you ever want to store the results as a data.frame or the like).

The design above controls for differences between horse at time=0, and then tests to see if there are any differences at any time point. So you get a single p-value (although there are multiple coefficients fitted, one for each time point other than t=0). See the LRT section of the vignette and ?results which has a specific section on the LRT.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your reply, Michael. I have renamed my variables.I ran:

ddsLRT <- DESeqDataSet(se, design = ~ horse + timepoint)
ddsLRT <- estimateSizeFactors(ddsLRT)
ddsLRT <- estimateDispersions(ddsLRT)
ddsLRT <- nbinomLRT(ddsLRT, reduced = ~ horse)
resLRT <- results(ddsLRT)
table(resLRT$padj < 0.1)

and my result was 

FALSE  TRUE 
 6812 11837

log2 fold change (MLE): timepoint time72h vs control0h 
LRT p-value: '~ horse + timepoint' vs '~ horse' 
DataFrame with 26991 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat       pvalue         padj
                    <numeric>      <numeric> <numeric>  <numeric>    <numeric>    <numeric>
ENSECAG00000000001 222.153134     -0.2119773 0.1507054   5.857054 0.1187757742  0.174482033
ENSECAG00000000002 227.264520      0.2389474 0.1314716   4.081881 0.2527571757  0.336834970
ENSECAG00000000003  43.521031      0.5022090 0.2916049   8.196995 0.0421111066  0.069049381
ENSECAG00000000004 123.625756     -0.2415447 0.1482973  16.996629 0.0007078716  0.001628157
ENSECAG00000000005   8.405233     -0.7764470 0.4533185   4.009984 0.2603880344  0.345326159
...                       ...            ...       ...        ...          ...          ...
ENSECAG00000027698   137.2158      -1.056968  3.548143  0.2138203 9.753280e-01 9.856341e-01
ENSECAG00000027699  5487.5252       3.261664  0.400470 61.2228495 3.220709e-13 2.356336e-12

but when I try  resultsNames(ddsLRTthe output is character(0). Therefore I think there is something wrog.

I also tried:

ddsTC <- DESeqDataSet(se, design = ~ horse + timepoint)
ddsTC <- DESeq(ddsTC, test="LRT", full= ~ horse + timepoint,  reduced = ~ horse)
resTC <- results (ddsTC)

and the result was the same: 
FALSE  TRUE 
 6812 11837 

log2 fold change (MLE): timepoint time72h vs control0h 
LRT p-value: '~ horse + timepoint' vs '~ horse' 
DataFrame with 26991 rows and 6 columns
                     baseMean log2FoldChange     lfcSE       stat       pvalue         padj
                    <numeric>      <numeric> <numeric>  <numeric>    <numeric>    <numeric>
ENSECAG00000000001 222.153134     -0.2119773 0.1507054   5.857054 0.1187757742  0.174482033
ENSECAG00000000002 227.264520      0.2389474 0.1314716   4.081881 0.2527571757  0.336834970
ENSECAG00000000003  43.521031      0.5022090 0.2916049   8.196995 0.0421111066  0.069049381
ENSECAG00000000004 123.625756     -0.2415447 0.1482973  16.996629 0.0007078716  0.001628157
ENSECAG00000000005   8.405233     -0.7764470 0.4533185   4.009984 0.2603880344  0.345326159
...                       ...            ...       ...        ...          ...          ...
ENSECAG00000027698   137.2158      -1.056968  3.548143  0.2138203 9.753280e-01 9.856341e-01
ENSECAG00000027699  5487.5252       3.261664  0.400470 61.2228495 3.220709e-13 2.356336e-12

but again with resultsNames(resTC) the output is character(0).

I did read the LRT section and ?results but I still do not understand why I do not have any info from resultsNames, to then generate results tables for individual effects, which must be individual elements of resultsNames. Is this indeed incorrect or am I missing something? 

ADD REPLY
0
Entering edit mode

You need to copy paste exactly the code you ran if you obtained an unexpected output. resultsNames() should be run on a DESeqDataSet (dds) not on a DESeqResults object (res). See ?resultsNames for the usage and example code.

ADD REPLY
0
Entering edit mode

Hi Michael, following on from this post and the LRT section of your DeSeq2 vignette, I was wondering if you can help answer my query here: https://support.bioconductor.org/p/118722/

Appreciate your time Michael.

ADD REPLY

Login before adding your answer.

Traffic: 685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6