Dear Bioconductor community,
I have timecourse RNA-seq data.
Animals received 4 differents treatments (group = GpA, GpB, GpCD, GpE).
There are 6 animals in each group.
For each animal,blood were collacted before any treatment (D0) and at 3 time points after treatment: D3, D7 and D28.
So, i have a design matrix looking like :
Animal Time Treatment Rep M657 0 GpA 1 M657 3 GpA 1 M657 7 GpA 1 M657 28 GpA 1 M658 0 GpA 2 M658 3 GpA 2 M658 7 GpA 2 M658 28 GpA 2 . . . . D627 0 GpB 1 D627 3 GpB 1 D627 7 GpB 1 D627 28 GpB 1 D628 0 GpB 2 D628 3 GpB 2 D628 7 GpB 2 D628 28 GpB 2 . . . .
With rep being a factor value for Animal-Group which distinguishes the individual nested within a group.
I want to determine :
1/ Which genes respond at either the Day3, Day7 or day28 in the different group
for exemple :
GpA : Day3vsDay0 Day7vsDay0, Day28vsDay0
do I have to analyse each treatment separatly ? Or I can analyse them all together ?
Because each subject yields a value for each time point, can I take "Animal" effect into my model ?
2/ Which genes respond differently over time in the different groups ie :
Which genes respond differently over time in the GpA relative to the GpB?
Here is what I did :
Design_clean <- data.frame(Treatment=Group,Time=factor(Time), ind.n=factor(order$Rep),row.names=row.names(order)) dds <- DESeqDataSetFromMatrix(countData = Count, colData = Design_clean, design=~Treatment+Treatment:ind.n+Treatment:Time) dds <- DESeq(dds, minReplicatesForReplace=Inf, modelMatrixType="standard", parallel=TRUE)
Is the design correct ?
Here are a subset of resultsNames(dds) output :
If I understant well, the following comparisons answer to my first question ie, which genes respond at either the Day3, Day7 or Day28 in the different group :
"GpA.Time3" "GpA.Time7" "GpA.Time28"
It's not indicated if the comparison was made against baseline (ie Day0), is it ? Is the within subject variability taken into account in these comparison ?
I also have :
"GpA.ind.n1" "GpA.ind.n2" "GpA.ind.n3" "GpA.ind.n4" "GpA.ind.n5" "GpA.ind.n6"
Is it within subject variability ?
"GpA_vs_GpB" "GpA_vs_GpCD" "GpA_vs_GpE"
Here, the comparisons are made between the Group, but Time information is not take into account.
How can I do, to have, for example, genes that respond differently over time in the GpA relative to the GpB at day 3 ie GpA_vs_GpB.day3 = (GpA.3-GpA.0)-(GpB.3-GpB.0)
Thanks in advance for your help,
C.
Is there any option in DESeq to protect against outliers ?
Yes there are automatic procedures to filter or replace outliers, and these have unit tests to make sure they are working. However even saying what is an "outlier" is difficult, and users sometimes want different behavior. If there are 5 samples in a group and 2 have high expression compared to another group, are those 2 samples oultiers? Or is that important biological signal? What if it's 2 out of 6, or 3 out of 6? You can read about the outlier routines in the vignette and in the DESeq2 paper. The short version is that an individual count is labelled as an outlier if the LFC would be very different without that count (as defined by a standard regression diagnostic).
I have another question.
I have the same kind of RNA-seq data to analyse, but this time, each subject doesn't yields a value at each time point. I have animal with missing data at some time points
Can I still set this design ?
I moved your post to a "comment", instead of an "answer".
It's not necessarily possible to fit the same model if there is missing data. If you have some un-paired samples, you can instead use limma-voom with the duplicateCorrelation() function. See the limma user guide and if necessary, make a new post.
How can I analyse these data with DESeq2 ?
Am I correct :
TS <- paste(Design_clean$Treatment, Design_clean$Time, sep=".")
Design_clean_simple <-data.frame(Group=TS,Animal=factor(Design_clean$Animal),row.names=row.names(Design_clean))
dds <- DESeqDataSetFromMatrix(countData = Count, colData = Design_clean, design=~Group)
If you want to test for the effect of treatment over time while controlling for individual animals, and some measurements are missing, you can't do this with a fixed effects model (you can't use DESeq2).
To be more specific: some animals are missing values along the time series.