Hi Karl,
Karl Brand wrote:
> Dear BioC,
>
> Using limma, when fitting the model:
> model.matrix(~Tissue * Pperiod + Time + Animal)
>
> I get this warning:
> > fit <- lmFit(rma.pp, design)
> Coefficients not estimable: Animal32 Animal33 Animal34 Animal35
Animal36
> Animal37 Animal38 Animal39 Animal40 Animal41 Animal42 Animal43
Animal44
> Animal45 Animal46 Animal47 Animal48
> Warning message:
> Partial NA coefficients for 45101 probe(s)
>
> In addition, the reuslting number or DE genes for my contrasts of
> interest (which are different than the 'not estimable' ones listed
in
> teh warning above) are mcuh lower than expected; & furthermore, the
> contrast-coefficents (log2FCs) and simply wrong.
>
> When fitting a similar model, merely lacking the 'pairing' factor
> ("Animal"):
> model.matrix(~Tissue * Pperiod + Time)
>
> I don't get this error. My question:
>
> Is it me? Or am i attempting the impossible, ie., by including a
factor
> for pairing (Animal) trying to fit more factors than my measurements
can
> support and this is limma's way of telling me? Raw script and
targets
> file below.
You may be attempting the impossible, or you may just be doing
something
incorrectly. You are certainly trying to estimate more parameters than
you have data with which to do so.
It looks like you have a fairly complex experimental design, so I
would
recommend finding a local statistician who can help you with the
analysis.
>
> I really hope an experienced limma user can enlighten me on this, or
> point me to a resource suitable for a biologists level of
understanding.
Pretty much any basic linear modeling textbook would be helpful.
However, it looks like you might have a timecourse experiment with
perhaps repeated measures, which may require a non-trivial analysis
method. As a Biologist, you might have jumped into the deep end of the
pool, so finding somebody local to help is not a bad idea.
Best,
Jim
>
> Thanks in advance,
>
> Karl
>
>
> > targets <- read.delim("RNA_Targets.txt")
>
> > Tissue <- factor(targets$Tissue, levels = c("R", "C"))
>
> > Pperiod <- factor(targets$Pperiod, levels = c("E", "L", "S"))
>
> > Time <- factor(targets$Time, levels = c("1", "2", "3", "4",
> + "5", "6", "7", "8",
> + .... [TRUNCATED]
>
> > Animal <- factor(targets$Animal, levels = c("1", "2", "3", "4",
> + "5", "6", "7", "8",
> + .... [TRUNCATED]
>
> > design <- model.matrix(~Tissue * Pperiod + Time + Animal)
>
> > colnames(design)
> [1] "(Intercept)" "TissueC" "PperiodL" "PperiodS"
> "Time2" "Time3" "Time4" "Time5"
> "Time6" "Time7"
> [11] "Time8" "Time9" "Time10"
"Time11"
> "Time12" "Time13" "Time14"
> "Time15" "Time16" "Animal2"
> [21] "Animal3" "Animal4" "Animal5"
"Animal6"
> "Animal7" "Animal8" "Animal9"
> "Animal10" "Animal11" "Animal12"
> [31] "Animal13" "Animal14" "Animal15"
"Animal16"
> "Animal17" "Animal18" "Animal19"
> "Animal20" "Animal21" "Animal22"
> [41] "Animal23" "Animal24" "Animal25"
"Animal26"
> "Animal27" "Animal28" "Animal29"
> "Animal30" "Animal31" "Animal32"
> [51] "Animal33" "Animal34" "Animal35"
"Animal36"
> "Animal37" "Animal38" "Animal39"
> "Animal40" "Animal41" "Animal42"
> [61] "Animal43" "Animal44" "Animal45"
"Animal46"
> "Animal47" "Animal48" "TissueC:PperiodL"
> "TissueC:PperiodS"
> > source(.trPaths[5], echo=TRUE, max.deparse.length=150)
>
> > fit <- lmFit(rma.pp, design)
> Coefficients not estimable: Animal32 Animal33 Animal34 Animal35
Animal36
> Animal37 Animal38 Animal39 Animal40 Animal41 Animal42 Animal43
Animal44
> Animal45 Animal46 Animal47 Animal48
> Warning message:
> Partial NA coefficients for 45101 probe(s)
> >
>
>
> FileName Tissue Pperiod Time Animal
> 01-PPL3-sample02.CEL R S 1 1
> 02-PPL3-sample03.CEL C S 1 1
> 03-PPL5-sample02.CEL R S 2 2
> 04-PPL5-sample03.CEL C S 2 2
> 05-PPL3-sample04.CEL R S 3 3
> 06-PPL3-sample05.CEL C S 3 3
> 07-PPL5-sample04.CEL R S 4 4
> 08-PPL5-sample05.CEL C S 4 4
> 09-PPL3-sample06.CEL R S 5 5
> 10-PPL3-sample07.CEL C S 5 5
> 11-PPL5-sample06.CEL R S 6 6
> 12-PPL5-sample07.CEL C S 6 6
> 13-PPL3-sample08.CEL R S 7 7
> 14-PPL3-sample09.CEL C S 7 7
> 15-PPL5-sample08.CEL R S 8 8
> 16-PPL5-sample09.CEL C S 8 8
> 17-PPL3-sample10.CEL R S 9 9
> 18-PPL3-sample11.CEL C S 9 9
> 19-PPL5-sample10.CEL R S 10 10
> 20-PPL5-sample11.CEL C S 10 10
> 21-PPL3-sample12.CEL R S 11 11
> 22-PPL3-sample13.CEL C S 11 11
> 23-PPL5-sample12.CEL R S 12 12
> 24-PPL5-sample13.CEL C S 12 12
> 25-PPL3-sample14.CEL R S 13 13
> 26-PPL3-sample15.CEL C S 13 13
> 27-PPL5-sample14.CEL R S 14 14
> 28-PPL5-sample15.CEL C S 14 14
> 29-PPL3-sample16.CEL R S 15 15
> 30-PPL3-sample17.CEL C S 15 15
> 31-PPL5-sample16.CEL R S 16 16
> 32-PPL5-sample17.CEL C S 16 16
> 33-PPL1-sample02.CEL R E 1 17
> 34-PPL1-sample03.CEL C E 1 17
> 35-PPL6-sample02.CEL R E 2 18
> 36-PPL6-sample03.CEL C E 2 18
> 37-PPL1-sample04.CEL R E 3 19
> 38-PPL1-sample05.CEL C E 3 19
> 39-PPL6-sample04.CEL R E 4 20
> 40-PPL6-sample05.CEL C E 4 20
> 41-PPL1-sample06.CEL R E 5 21
> 42-PPL1-sample07.CEL C E 5 21
> 43-PPL6-sample06.CEL R E 6 22
> 44-PPL6-sample07.CEL C E 6 22
> 45-PPL1-sample08.CEL R E 7 23
> 46-PPL1-sample09.CEL C E 7 23
> 47-PPL6-sample08.CEL R E 8 24
> 48-PPL6-sample09.CEL C E 8 24
> 49-PPL1-sample10.CEL R E 9 25
> 50-PPL1-sample11.CEL C E 9 25
> 51-PPL6-sample10.CEL R E 10 26
> 52-PPL6-sample11.CEL C E 10 26
> 53-PPL1-sample12.CEL R E 11 27
> 54-PPL1-sample13.CEL C E 11 27
> 55-PPL6-sample12.CEL R E 12 28
> 56-PPL6-sample13.CEL C E 12 28
> 57-PPL1-sample14.CEL R E 13 29
> 58-PPL1-sample15.CEL C E 13 29
> 59-PPL6-sample14.CEL R E 14 30
> 60-PPL6-sample15.CEL C E 14 30
> 61-PPL1-sample16.CEL R E 15 31
> 62-PPL1-sample17.CEL C E 15 31
> 63-PPL6-sample16.CEL R E 16 32
> 64-PPL6-sample17.CEL C E 16 32
> 65-PPL2-sample02.CEL R L 1 33
> 66-PPL2-sample03.CEL C L 1 33
> 67-PPL4-sample02.CEL R L 2 34
> 68-PPL4-sample03.CEL C L 2 34
> 69-PPL2-sample04.CEL R L 3 35
> 70-PPL2-sample05.CEL C L 3 35
> 71-PPL4-sample04.CEL R L 4 36
> 72-PPL4-sample05.CEL C L 4 36
> 73-PPL2-sample06.CEL R L 5 37
> 74-PPL2-sample07.CEL C L 5 37
> 75-PPL4-sample06.CEL R L 6 38
> 76-PPL4-sample07.CEL C L 6 38
> 77-PPL2-sample08.CEL R L 7 39
> 78-PPL2-sample09.CEL C L 7 39
> 79-PPL4-sample08.CEL R L 8 40
> 80-PPL4-sample09.CEL C L 8 40
> 81-PPL2-sample10.CEL R L 9 41
> 82-PPL2-sample11.CEL C L 9 41
> 83-PPL4-sample10.CEL R L 10 42
> 84-PPL4-sample11.CEL C L 10 42
> 85-PPL2-sample12.CEL R L 11 43
> 86-PPL2-sample13.CEL C L 11 43
> 87-PPL4-sample12.CEL R L 12 44
> 88-PPL4-sample13.CEL C L 12 44
> 89-PPL2-sample14.CEL R L 13 45
> 90-PPL2-sample15.CEL C L 13 45
> 91-PPL4-sample14.CEL R L 14 46
> 92-PPL4-sample15.CEL C L 14 46
> 93-PPL2-sample16.CEL R L 15 47
> 94-PPL2-sample17.CEL C L 15 47
> 95-PPL4-sample16.CEL R L 16 48
> 96-PPL4-sample17.CEL C L 16 48
>
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
Gordon,
Many thanks for your follow up to James' help*.
My limited understanding of duplicateCorrelation lead me to believe that it's purpose was to specify in a design, measurements/samples expected to be similar, if not highly similar to each other, as is the case with technical and biological replicates, and attribute the differences between such specified replicates being due to technical/biological 'noise' via a modeling system different to the linear ones used for DE gene identification. Do i understand correctly?
In our experiment, although Tissue "R" & "C" are expected to be more similar than different on the whole genome level;
our interest is in the *differences* as is probably obvious from the model.
So im just trying to get my biologists understanding c.clear that applying duplicateCorrelation to the Animal blocking as you suggested, is at least a *worthwhile thing to try*. I guess you wouldn't provide the
example if you didn't think so, but right now is a good time for emphatic clarity.
Of course i did try, and the results are logical to me and thus worth using. In short, the no. of DE genes for the contrasts between factor=Pperiod changed little (<5%), whilst the contrasts between factor=Tissue for "R" & "C" changed alot- DE genes increased by ~20% by inclusion of duplicateCorrelation.
Again, sincere cheers,
Karl
Dear Karl,
The block argument of duplicateCorrelation() gives you a way to allow for correlation between repeat arrays on the same animal. It is a direct extension of the usual linear model. You found the correlation to be about 0.24, which is typical for data in which each block is an independent organism.
Yes, I wouldn't have suggested it if I didn't think it was worth trying. Indeed, if you want to examine Tissue and Pperiod effects in the same analysis, you must do it like this.
Best wishes
Gordon