Hello,
In my experiment, we have mice treated with drug and saline at 2 week stage. We have transcriptome data for both these treatments across five development time points - 2w, 3w, 4w, 6w and 10w. My objective is to detect the effect of drug at each time point, where the Sal accounts for changes that happen during normal development. I guess my situation is similar to the example case given in the user manual section 3.3.2.
I tried the formula:
targets$Treat <- relevel(factor(targets$Treat), ref="Placeb")
design <- model.matrix(~Treat + Treat:Time, data=targets)
but, depending on how I label my dataset, I get different number of DE genes. I guess it is because the first term gets used as a baseline. For eg., when I label my data as 2w, 3w, 4w, 6w, 10w, the coefficient names are:
> colnames(fit) [1] "(Intercept)" "TreatDrug" "TreatPlaceb:Time2w" "TreatDrug:Time2w" "TreatPlaceb:Time3w" "TreatDrug:Time3w" "TreatPlaceb:Time4w" "TreatDrug:Time4w" "TreatPlaceb:Time6w" [10] "TreatDrug:Time6w"
and,
the drug effect at 2w - lrt_Drug_2w <- glmLRT(fit, coef=c(4))
- and the resulting DE genes are 7912
the drug effect at 3w - lrt_Drug_3w <- glmLRT(fit, coef=c(6))
- and the resulting DE genes are 3925
the drug effect at 4w - lrt_Drug_4w <- glmLRT(fit, coef=c(8))
- and the resulting DE genes are 961
the drug effect at 6w - lrt_Drug_6w <- glmLRT(fit, coef=c(10))
- and the resulting DE genes are 15
But, if I rename my 10w sample to 9w [changes the alphabetical order], I get the following coefficient names:
> colnames(fit) [1] "(Intercept)" "TreatDrug" "TreatPlaceb:Time3w" "TreatDrug:Time3w" "TreatPlaceb:Time4w" "TreatDrug:Time4w" "TreatPlaceb:Time6w" "TreatDrug:Time6w" "TreatPlaceb:Time9w" [10] "TreatDrug:Time9w"
and
the drug effect at 3w - lrt_Drug_3w <- glmLRT(fit, coef=c(4))
- and the resulting DE genes are 2887
the drug effect at 4w - lrt_Drug_4w <- glmLRT(fit, coef=c(6))
- and the resulting DE genes are 5338
the drug effect at 6w - lrt_Drug_6w <- glmLRT(fit, coef=c(8))
- and the resulting DE genes are 6394
the drug effect at 9w - lrt_Drug_9w <- glmLRT(fit, coef=c(10))
- and the resulting DE genes are 7907
As you can see, the numbers are vastly different. I'm not sure if I've done anything wrong. I've pasting all steps (code) below.
In my experiment there is no 0h (as in the example in the case study of section 3.3). So I cannot define any one time point as a base line. As I'd mentioned earlier, my objective is to identify the genes that change due to drug treatment at 2w, across development. There are normal, expected changes that happen during development. We wish to bring out the long-term effect of the drug treatment at an early time point (2w), taking into consideration the changes in development.
I'd be glad if you could help me with this problem. Thanks very much.
Manoj.
Here's the code I use:
targets <- readTargets("AllCountData_Info.txt") x <- read.delim("HTSqRvrse_Counts_ReOrdrd" , row.names=1) y <- DGEList(counts=x[,1:20], group=targets$Treat) colnames(y) <- targets$Label y$samples$lib.size <- colSums(y$counts) y$samples y <- calcNormFactors(y) Treat <- factor(targets$Treat, levels=c("Drug","Placeb")) Group <- factor(paste(targets$Treat,targets$Time,sep=".")) cbind(targets,Group=Group) targets$Treat <- relevel(factor(targets$Treat), ref="Placeb") design <- model.matrix(~Treat + Treat:Time, data=targets) # consider all the levels of time for each treatment drug separately. The second term is a nested interaction, the interaction of Time within Treat. y <- estimateGLMCommonDisp(y,design, verbose=TRUE) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) colnames(fit) lrt_Drug_2w <- glmLRT(fit, coef=c(4)) FDR_Drug_2w <- p.adjust(lrt_Drug_2w$table$PValue, method="BH") sum(FDR_Drug_2w < 0.05) summary(de_Drug_2w <- decideTestsDGE(lrt_Drug_2w)) detags_Drug_2w <- rownames(y)[as.logical(de_Drug_2w)]
---- end of sample code ---
Input file:
AllCountData_Info.txt Lane Treat Time Label 2S1 Placeb 2w 2S1 2S2 Placeb 2w 2S2 3S1 Placeb 3w 3S1 3S2 Placeb 3w 3S2 4S1 Placeb 4w 4S1 4S2 Placeb 4w 4S2 6S1 Placeb 6w 6S1 6S2 Placeb 6w 6S2 10S1 Placeb 10w 10S1 10S2 Placeb 10w 10S2 2K1 Drug 2w 2K1 2K2 Drug 2w 2K2 3K1 Drug 3w 3K1 3K2 Drug 3w 3K2 4K1 Drug 4w 4K1 4K2 Drug 4w 4K2 6K1 Drug 6w 6K1 6K2 Drug 6w 6K2 10K1 Drug 10w 10K1 10K2 Drug 10w 10K2
---- end of input file ---
> sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.16.1 RColorBrewer_1.1-2 gplots_2.16.0 statmod_1.4.20 locfit_1.5-9.1 edgeR_3.8.5 limma_3.22.6 loaded via a namespace (and not attached): [1] bitops_1.0-6 caTools_1.17.1 gdata_2.13.3 grid_3.1.1 gtools_3.4.1 KernSmooth_2.23-14 lattice_0.20-30 tools_3.1.1 >
---- end of message ---
Thanks for your comment Aaron.
Below is the design matrix that was generated:
I'm trying to get a baseline effect from the Placebo across development and then obtain treatment effects at each time point. I did do the classic ET at specific time points (eg., for drug vs placebo at 2w I get 24 DE genes).
I'll interpret "baseline effect from Placebo across development" as the effect of development time on the placebo samples. I'll assume that you want to compare adjacent time points, in which case:
You can then look at the contrasts and see which genes are consistently up/down-regulated across development, or get turned on/off within a specific time-frame, etc. This can be done by running each contrast separately and intersecting the results, or by running all contrasts together in an ANOVA-like comparison and sorting genes by the sign of the log-fold change in each contrast.
If, as in your original post, you were to drop the
TreatPlaceb:Time
coefficient for each time point, you'd only be comparing each placebo sample to the 10-week placebo. This also explains why you get different behaviour in the two design matrices in your original post; in your second design, dropping the coefficients compares each other placebo sample to the 2-week placebo, not the 10-week placebo.As for your comparisons between treatment and placebo at each time point, this is also easy to formulate:
Thanks Aaron.
Part 1: I'm not really trying to get the effect of placebo -- I want to use this (placebo) to account for changes during development, and get the effect of treatment.
Part 2: I understand that the problem is because only one of the Placebo time points is getting considered (10w or 2w, depending on which label comes first (alphabetically). How could I get an average of all placebo (across development) as a baseline, instead of just one specific time point?
Part 3: For the TvP comparison at each time point as you have mentioned, I could also just use the classic ET test, right? I prefer to get the whole developmental change also accounted for.
With regards to Part 2, I'm not sure what you want to compare to the placebo baseline. It doesn't make a lot of sense to compare treatment samples to this baseline, if you already have matched placebo samples at the same timepoint.
For part 3, I don't know what you mean by the "classic ET test". The closest I can think of is classic edgeR, but this isn't relevant for the GLM framework. I'm also unclear on what you mean when you talk about accounting for the "whole developmental change".
Thanks Aaron for your inputs. I agree, when you take out the phrase "whole developmental change", it doesn't make sense.
As explained in my original post:
"my objective is to identify the genes that change due to drug treatment at 2w, across development. There are normal, expected changes that happen during development. We wish to bring out the long-term effect of the drug treatment at an early time point (2w), taking into consideration the changes in development."
Regarding Part 2: I'm trying the nested interaction formula. As I understand, this can be used to get the effect of "Time within Treat" (as per edgeR manual).
My problem is that I do not have a 0h to make as a baseline.
Okay, one problem at a time.
You say that "my objective is to identify the genes that change due to drug treatment at 2w, across development". Which one is it? At 2 weeks, or across development?
Let's simplify the discussion. Say we have treated and normal samples for both 2 and 10 week timepoints. What kind of gene would you hope to detect?