Hi Bioconductors,
I am working on analyzing data from an Illumina beadchip microarray with samples from the following experimental set-up:
7 tissue donors x 3 treatments x 3 RNA collection time points
Treatments: Mock infection, virus A infection, virus B infection
RNA collected from tissue at time points 1,2 and 3
At least initially, I am planning to analyze the data separately for each time point, so I am not including time as a factor in the design matrices that I am making. I would like to do an analysis where virus A treatment and virus B treatment are paired with the Mock treatment of their common tissue donor and look at mock vs A and mock vs B and A vs B
Problem: Some of the samples failed on the array, including two of the mock treatments, so for those donors, I am left with just virusA and virusB infection.
So far I have been looking at pages 43 and 44 of the limma manual to figure out how to make appropriate design matrices, but was wondering how limma works when a comparison is indicated in the design or contrast matrix and a sample is missing one of the replicates necessary to make that comparison.
Thanks in advance!
Claire Levy
University of Washington OBGYN
Fred Hutchinson Cancer Research Center VIDD
I meant statistically, but it is obviously better from an interpretation perspective as well.
You don't even need to worry about the special features of time course experiments. You can just view your study as 9 different treatments on each donor, where each treatment is an infection by time point combination.
Fantastic, I have been following the "Multilevel Experiments" section (pg 51) and it more or less makes sense but I am not sure about the duplicateCorrelation part. The difference between my experiment and the example is that each of my subjects is exposed to all conditions, while in the example, some subjects have diseased tissues and others have normal (rather than all subjects having both diseased and normal).
I went through, following the example like so:
#In my exp, Treatment is analogous
#to "condition" in the example, Time is analogous to Tissue
# and TissueID is to subject.
#combine the Treatment and Time parameters into one
Treat <-factor(paste(targets$Treatment,targets$Time, sep="."))
design<-model.matrix (~0 + Treat)
#Then we estimate the correlation between
#measurements made on the same subject
corfit<- duplicateCorrelation(expressedProbes.lumi,design,
block=targets$TissueID)
corfit$consensus
#[1] 0.03368044
#I guess this means the TissueIDs are not significantly correlated?
#Then this inter-subject correlation is
#input into the linear model fit
fit <- lmFit(expressedProbes.lumi,design,block=targets$TissueID,
correlation=corfit$consensus)
#Now we can make any comparisons
#between the experimental conditions
cm<-makeContrasts(
SD90.3vsMock.3 = TreatA.3-TreatMock.3,
V186.3vsMock.3 = TreatB.3-TreatMock.3,
levels = design
)
fit2<-contrasts.fit(fit,cm)
fit2 <-eBayes(fit2)
topTable(fit2,coef="TreatA.3vsMock.3")
But, the final paragraph of the section on pg 52 mentions that
"Here the comparison between tissues can be made within subjects [true for my experiment], and hence should be more precise than the comparison between diseased and normal, which must be made between subjects [not necessarily true for my experiment, since all subjects have all treatments]."
So maybe I need to do something different?
However, even though I can do all my comparisons within subjects, I still want to be able to say something about differences in gene expression between the different treatments in general, not just about these particular tissue donors. If my googling of the definition of random effects has given me the right answer, it seems more right to keep the analysis as is.
You don't have a multilevel experiment. You just have an ordinary blocked experiment. You can just follow the suggestion of Section 9.4.2 with
It's one of most common and standard experimental designs and I think you might be seeing more complications than there actually are.
Some people like to rearrange the model formula:
This is equivalent to the previous formula, but allows you use use the same contrast definitions as you are using now.
Alternatively you could use duplicateCorrelation to handle the donors, as you seem to have done above (I can't really tell, is TissueID the same as Donor?), instead of putting them in the design matrix. If the donors are not spectacularly different, then that may be a good approach. It can give more information for an unbalanced experiment, when not all treatments are measured on all subjects.
Great, I'm glad things are not always as complicated as they seem. I'll have another go at it. TissueID is the same thing as donor by the way. Thanks for your help!
Great, I'm glad things are not always as complicated as they seem. I'll have another go at it. TissueID is the same thing as donor by the way. Thanks for your help!