Entering edit mode
Mingkwan Nipitwattanaphon
▴
30
@mingkwan-nipitwattanaphon-3160
Last seen 10.2 years ago
Dear BioC users,
I am using 2 color-spotted microarrays. My samples are queen ants
with different genotypes (BB=D, Bb=H, bb=R), social forms (Monogyne,
Polygyne) and ages (young virgin queen=2d, mature virgin queen=11d,
and mated/reproductive/ mother queen=mom). They are hybridized to the
reference RNA.
I would like to analyze my data by making model matrix like
y~ Age + Genotype *nested within* Social form + Age : Genotype *nested
within* Social form + fixed factor (Batch)
I have tried to analyze my data as show below and I got many
questions.
1. Differences between different commands of model.matrix. From the
examples below, I know that ~0 or ~1 is just whether I want the
intercept or not but when there is an intercept, one of the groups is
disappeared from the model (in this case, batchI or M11dD is
disappeared depending on which factor comes first). Would this affect
the result? Is the order of the factors in the model important?
> design1 <-
model.matrix(~0+factor(targets$Batch)+factor(targets$Age)
%in%factor(targets$Genotype)%in%factor(targets$SocialForm))
> colnames(design1)
[1] "factor(targets$Batch)I"
[2] "factor(targets$Batch)J"
[3] "factor(targets$Age)11d:factor(targets$Genotype)D:factor(targets
$SocialForm)M"
[4] "factor(targets$Age)2d:factor(targets$Genotype)D:factor(targets
$SocialForm)M"
[5]
"factor(targets$Age)momQ:factor(targets$Genotype)D:factor(targets
$SocialForm)M"
[6] "factor(targets$Age)11d:factor(targets$Genotype)H:factor(targets
$SocialForm)M"
[7] "factor(targets$Age)2d:factor(targets$Genotype)H:factor(targets
$SocialForm)M"
[8]
"factor(targets$Age)momQ:factor(targets$Genotype)H:factor(targets
$SocialForm)M"
[9] "factor(targets$Age)11d:factor(targets$Genotype)R:factor(targets
$SocialForm)M"
[10] "factor(targets$Age)2d:factor(targets$Genotype)R:factor(targets
$SocialForm)M"
[11] "factor(targets$Age)momQ:factor(targets$Genotype)R:factor(targets
$SocialForm)M"
[12] "factor(targets$Age)11d:factor(targets$Genotype)D:factor(targets
$SocialForm)P"
[13] "factor(targets$Age)2d:factor(targets$Genotype)D:factor(targets
$SocialForm)P"
[14] "factor(targets$Age)momQ:factor(targets$Genotype)D:factor(targets
$SocialForm)P"
[15] "factor(targets$Age)11d:factor(targets$Genotype)H:factor(targets
$SocialForm)P"
[16] "factor(targets$Age)2d:factor(targets$Genotype)H:factor(targets
$SocialForm)P"
[17] "factor(targets$Age)momQ:factor(targets$Genotype)H:factor(targets
$SocialForm)P"
[18] "factor(targets$Age)11d:factor(targets$Genotype)R:factor(targets
$SocialForm)P"
[19] "factor(targets$Age)2d:factor(targets$Genotype)R:factor(targets
$SocialForm)P"
[20] "factor(targets$Age)momQ:factor(targets$Genotype)R:factor(targets
$SocialForm)P"
> design1.1 <- model.matrix(~1+factor(targets$Batch)+factor(targets
$Age)%in%factor(targets$Genotype)%in%factor(targets$SocialForm))
> colnames(design1.1)
[1] "(Intercept)"
[2] "factor(targets$Batch)J"
[3] "factor(targets$Age)11d:factor(targets$Genotype)D:factor(targets
$SocialForm)M"
[4] "factor(targets$Age)2d:factor(targets$Genotype)D:factor(targets
$SocialForm)M"
[5]
"factor(targets$Age)momQ:factor(targets$Genotype)D:factor(targets
$SocialForm)M"
[6] "factor(targets$Age)11d:factor(targets$Genotype)H:factor(targets
$SocialForm)M"
[7] "factor(targets$Age)2d:factor(targets$Genotype)H:factor(targets
$SocialForm)M"
[8]
"factor(targets$Age)momQ:factor(targets$Genotype)H:factor(targets
$SocialForm)M"
[9] "factor(targets$Age)11d:factor(targets$Genotype)R:factor(targets
$SocialForm)M"
[10] "factor(targets$Age)2d:factor(targets$Genotype)R:factor(targets
$SocialForm)M"
[11] "factor(targets$Age)momQ:factor(targets$Genotype)R:factor(targets
$SocialForm)M"
[12] "factor(targets$Age)11d:factor(targets$Genotype)D:factor(targets
$SocialForm)P"
[13] "factor(targets$Age)2d:factor(targets$Genotype)D:factor(targets
$SocialForm)P"
[14] "factor(targets$Age)momQ:factor(targets$Genotype)D:factor(targets
$SocialForm)P"
[15] "factor(targets$Age)11d:factor(targets$Genotype)H:factor(targets
$SocialForm)P"
[16] "factor(targets$Age)2d:factor(targets$Genotype)H:factor(targets
$SocialForm)P"
[17] "factor(targets$Age)momQ:factor(targets$Genotype)H:factor(targets
$SocialForm)P"
[18] "factor(targets$Age)11d:factor(targets$Genotype)R:factor(targets
$SocialForm)P"
[19] "factor(targets$Age)2d:factor(targets$Genotype)R:factor(targets
$SocialForm)P"
[20] "factor(targets$Age)momQ:factor(targets$Genotype)R:factor(targets
$SocialForm)P"
> designOld <- model.matrix(~0+factor(targets$Cy3)+factor(targets
$Batch))
> colnames(designOld)
[1] "factor(targets$Cy3)M11dD" "factor(targets$Cy3)M2dD"
"factor(targets$Cy3)MomD" "factor(targets$Cy3)MomH"
[5] "factor(targets$Cy3)P11dD" "factor(targets$Cy3)P11dH"
"factor(targets$Cy3)P11dR" "factor(targets$Cy3)P2dD"
[9] "factor(targets$Cy3)P2dH" "factor(targets$Batch)J"
>
> designOld1 <- model.matrix(~1+factor(targets$Cy3)+factor(targets
$Batch))
> colnames(designOld1)
[1] "(Intercept)" "factor(targets$Cy3)M2dD"
"factor(targets$Cy3)MomD" "factor(targets$Cy3)MomH"
[5] "factor(targets$Cy3)P11dD" "factor(targets$Cy3)P11dH"
"factor(targets$Cy3)P11dR" "factor(targets$Cy3)P2dD"
[9] "factor(targets$Cy3)P2dH" "factor(targets$Batch)J"
> designOldx <- model.matrix(~0+factor(targets$Batch)+factor(targets
$Cy3))
> colnames(designOldx)
[1] "factor(targets$Batch)I" "factor(targets$Batch)J"
"factor(targets$Cy3)M2dD" "factor(targets$Cy3)MomD"
[5] "factor(targets$Cy3)MomH" "factor(targets$Cy3)P11dD"
"factor(targets$Cy3)P11dH" "factor(targets$Cy3)P11dR"
[9] "factor(targets$Cy3)P2dD" "factor(targets$Cy3)P2dH"
> designOld1 <- model.matrix(~1+factor(targets$Batch)+factor(targets
$Cy3))
> colnames(designOld1)
[1] "(Intercept)" "factor(targets$Batch)J"
"factor(targets$Cy3)M2dD" "factor(targets$Cy3)MomD"
[5] "factor(targets$Cy3)MomH" "factor(targets$Cy3)P11dD"
"factor(targets$Cy3)P11dH" "factor(targets$Cy3)P11dR"
[9] "factor(targets$Cy3)P2dD" "factor(targets$Cy3)P2dH"
2. Although I have 3 different genotypes, 2 social forms and 3 time
points, some combinations are biologically impossible, e.g. M11dH,
M2dH, MmomH, M11dR, M2dR, MmomR, PmomD, PmomR. When I use the command
below, limma gives me all of the combinations but only some do exist.
So, the ones with do not exist, have only 0 value in all rows. These
groups will result in NA values for all the spots after the model fit.
Is it OK to have NA values or it is better to remove these groups
from the model matrix before fitting the model?
design1 <-
model.matrix(~0+factor(targets$Batch)+factor(targets$Age)%in
%factor(targets$Genotype)%in%factor(targets$SocialForm))
> design1
BatchI BatchJ M11dD M2dD MmomD M11dH M2dH MmomH M11dR M2dR MmomR
P11dD P2dD PmomD P11dH P2dH PmomH P11dR P2dR PmomR
1 1 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
3 1 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
4 1 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
.
.
.
99 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
attr(,"assign")
[1] 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$`factor(targets$Batch)`
[1] "contr.treatment"
attr(,"contrasts")$`factor(targets$Age)`
[1] "contr.treatment"
attr(,"contrasts")$`factor(targets$Genotype)`
[1] "contr.treatment"
attr(,"contrasts")$`factor(targets$SocialForm)`
[1] "contr.treatment"
3. Problem with model fit
After fitting the model. I got NA values for all spots of all the
groups that do not exist. This could be normal, but the group "P11dR"
do exist. Why did I also get NAs from this group?
isGene <- MA$genes$Status == "cDNA"
> fitDesign1 <- lmFit(MA[isGene,], design1)
Coefficients not estimable: M11dH M2dH MmomH M11dR M2dR MmomR PmomD
P11dR P2dR PmomR
Warning message:
Partial NA coefficients for 23523 probe(s)
> fitDesign1
An object of class "MArrayLM"
$coefficients
BatchI BatchJ M11dD M2dD MmomD M11dH
M2dH MmomH M11dR M2dR MmomR P11dD P2dD
[1,] 0.1905989 0.0814195 -0.01294777 -0.26631768 1.85216328 NA
NA NA NA NA NA 0.25491128 0.201785033
[2,] 0.6172982 0.8268361 -0.04929359 0.01953275 -0.31916109 NA
NA NA NA NA NA -0.01204587 -0.008850553
[3,] 2.1798376 2.0604288 0.51830689 0.32399634 -0.03122528 NA
NA NA NA NA NA -0.32814253 0.042235511
[4,] 1.0773968 0.9888733 0.12482602 0.20639996 -0.91445161 NA
NA NA NA NA NA -0.41316719 -0.050187922
[5,] 1.0516807 0.4855826 -0.62717591 -0.21835907 -2.16438471 NA
NA NA NA NA NA -0.75978080 -0.231660818
PmomD P11dH P2dH PmomH P11dR P2dR PmomR
[1,] NA -0.12924261 -0.10818149 2.0982355 NA NA NA
[2,] NA -0.18572498 -0.04461951 -0.5751055 NA NA NA
[3,] NA 0.01181522 0.37474136 -0.2172751 NA NA NA
[4,] NA -0.21312225 -0.07978014 -0.3549516 NA NA NA
[5,] NA -0.38056336 -0.16362561 -1.7943584 NA NA NA
23526 more rows ...
$stdev.unscaled
BatchI BatchJ M11dD M2dD MmomD M11dH M2dH
MmomH M11dR M2dR MmomR P11dD P2dD PmomD
[1,] 0.4494704 0.4820557 0.5740672 0.5740672 0.6125642 NA NA
NA NA NA NA 0.5218120 0.5198694 NA
[2,] 0.5831241 0.6001124 0.6782392 0.6782392 0.7092194 NA NA
NA NA NA NA 0.6534186 0.6520187 NA
[3,] 0.4494973 0.4824567 0.5741145 0.5741145 0.6128798 NA NA
NA NA NA NA 0.5253914 0.5199107 NA
[4,] 0.5829695 0.5995112 0.6782060 0.6782060 0.7087107 NA NA
NA NA NA NA 0.6533627 0.6458128 NA
[5,] 0.7071068 0.7580744 0.8022900 0.8102921 0.8610130 NA NA
NA NA NA NA 0.7911144 0.7870251 NA
P11dH P2dH PmomH P11dR P2dR PmomR
[1,] 0.5253396 0.5167718 0.5978108 NA NA NA
[2,] 0.6412859 0.6424903 0.7484216 NA NA NA
[3,] 0.5253914 0.5168244 0.6320057 NA NA NA
[4,] 0.6412651 0.6374888 0.7479396 NA NA NA
[5,] 0.7786772 0.7786772 0.9528956 NA NA NA
23526 more rows ...
$sigma
[1] 0.4750674 0.2759988 0.7108862 0.3199147 0.2901134
23526 more elements ...
$df.residual
[1] 86 69 83 71 56
23526 more elements ...
$cov.coefficients
BatchI BatchJ M11dD M2dD MmomD
P11dD P2dD P11dH P2dH PmomH
BatchI 0.2019737 0.1921053 -0.1970395 -0.1970395 -0.1921053
-0.1967105 -0.1970395 -0.1967105 -0.1970395 -0.1921053
BatchJ 0.1921053 0.2315789 -0.2118421 -0.2118421 -0.2315789
-0.2131579 -0.2118421 -0.2131579 -0.2118421 -0.2315789
M11dD -0.1970395 -0.2118421 0.3294408 0.2044408 0.2118421
0.2049342 0.2044408 0.2049342 0.2044408 0.2118421
M2dD -0.1970395 -0.2118421 0.2044408 0.3294408 0.2118421
0.2049342 0.2044408 0.2049342 0.2044408 0.2118421
MmomD -0.1921053 -0.2315789 0.2118421 0.2118421 0.3565789
0.2131579 0.2118421 0.2131579 0.2118421 0.2315789
P11dD -0.1967105 -0.2131579 0.2049342 0.2049342 0.2131579
0.2721491 0.2049342 0.2054825 0.2049342 0.2131579
P2dD -0.1970395 -0.2118421 0.2044408 0.2044408 0.2118421
0.2049342 0.2669408 0.2049342 0.2044408 0.2118421
P11dH -0.1967105 -0.2131579 0.2049342 0.2049342 0.2131579
0.2054825 0.2049342 0.2721491 0.2049342 0.2131579
P2dH -0.1970395 -0.2118421 0.2044408 0.2044408 0.2118421
0.2049342 0.2044408 0.2049342 0.2669408 0.2118421
PmomH -0.1921053 -0.2315789 0.2118421 0.2118421 0.2315789
0.2131579 0.2118421 0.2131579 0.2118421 0.3565789
$pivot
[1] 1 2 3 4 5 12 13 15 16 17 6 7 8 9 10 11 14 18 19 20
$genes
Block Row Column ID Name Status
9 1 1 9 SiJWA06AAB_pcr_2 NA cDNA
10 1 1 10 SiJWA12AAB_pcr_2 NA cDNA
11 1 1 11 SiJWC06AAB_pcr_2 NA cDNA
12 1 1 12 SiJWC12AAB_pcr_2 NA cDNA
13 1 1 13 SiJWE06AAB_pcr_2 NA cDNA
23526 more rows ...
$Amean
[1] 9.465811 8.056152 8.586644 8.151375 7.580591
23526 more elements ...
$method
[1] "ls"
$design
BatchI BatchJ M11dD M2dD MmomD M11dH M2dH MmomH M11dR M2dR MmomR
P11dD P2dD PmomD P11dH P2dH PmomH P11dR P2dR PmomR
1 1 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
3 1 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
4 1 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
5 0 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
94 more rows ...
4. After fitting the model, do I need to make contrasts between
interesting comparisons? Although all slides are compared to
reference, with this complicated linear model, I am not sure that if I
do not make contrast, the limma will give me the coefficient of each
group compared to reference or something else.
I truly appreciate all the answers in advance!
best regards,
Mingkwan Nipitwattanaphon
[[alternative HTML version deleted]]