Chol-hee,
Notice the simple example:
> x=as.factor(c(1,0,1,0));y=as.factor(c(1,2,1,2));z=rnorm(4)
Notice that x and y are the same covariate. Now:
> design=model.matrix(z~x+y)
> design
(Intercept) x1 y2
1 1 1 0
2 1 0 1
3 1 1 0
4 1 0 1
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"
attr(,"contrasts")$y
[1] "contr.treatment"
> solve(t(design)%*%design)
Error in solve.default(t(design) %*% design) :
Lapack routine dgesv: system is exactly singular
You get the singularity error because your covariates are exactly the
same (or not linearly independent). Now if you concatenate the
variables like you did:
> xy=as.factor(paste(x,y,sep='.'))
> xy
[1] 1.1 0.2 1.1 0.2
Levels: 0.2 1.1
Which is clearly different than the original x+y. Now:
> design=model.matrix(z~xy)
> design
(Intercept) xy1.1
1 1 1
2 1 0
3 1 1
4 1 0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$xy
[1] "contr.treatment"
> solve(t(design)%*%design)
(Intercept) xy1.1
(Intercept) 0.5 -0.5
xy1.1 -0.5 1.0
Which now works. However, note that I wouldn't use the latter. You
need to find out which of your six covariates are linearly dependent
with each other and remove one or more so they are NOT linearly
dependent. This will be different from your second attempt but will be
equivalent to what you were trying to accomplish in your first attempt
Let me know if this doesn't work!
Thanks!
Evan
--
W. Evan Johnson
Assistant Professor
Division of Computational Biomedicine
Boston University School of Medicine
72 East Concord St., E-645
Boston, MA 02118
Phone: (617) 638-2541
On Oct 25, 2012, at 6:00 AM, bioconductor-request@r-project.org wrote:
> ------------------------------
>
> Message: 8
> Date: Wed, 24 Oct 2012 13:40:33 -0400
> From: Achilleas Pitsillides <anp4r@virginia.edu>
> To: Cholhee Jung <jung.cholhee@gmail.com>, Bioconductor mailing list
> <bioconductor@stat.math.ethz.ch>
> Subject: Re: [BioC] Fwd: covariate information
> Message-ID:
>
<cabdy-6=dp5-wu+zqo4-upmbpx51zecrpqcwdsv5ygyhg2z821a@mail.gmail.com>
> Content-Type: text/plain
>
> Dear Chol-hee,
>
> The short answer is that the two model matrices are different and
they have
> different dimensions; you can verify this by using the dim(mod_mat)
to see
> the dimension of the model matrix.
>
> Here is my understanding: If you have a factor f1 with 3 levels and
a
> factor f2 with 2 levels (where all the possible level combinations
exist),
> then f1:f2 is all the combinations of f1 and f2 (an equivalent
factor
> with 6 levels) and the model matrix ~f1:f2 would have six columns
(i.e.
> fit a model with 6 coefficients).
> However, the model matrix ~f1+f2 will have 4 columns ( i.e. fit 4
> coefficients: constant, two for f1 and one for f2).
> The model ~f1*f2 will fit 6 coefficients and have a model matrix
with the
> same column space as the model matrix for ~f1:f2.
>
>
> I hope this helps,
>
> cheers,
> Achilleas
>
>
> On Tue, Oct 23, 2012 at 10:10 PM, Cholhee Jung
<jung.cholhee@gmail.com>wrote:
>
>>
>>
>> Dear users,
>>
>> Below is the question I posted originally in the 'ComBat user
forum' of
>> Google group.
>> But, as I was suggested to forward my question to Bioconductor
mailing
>> list, I'm doing it now.
>>
>> Please find my question, below.
>>
>>
>> Regards,
>> Chol-hee
>>
>> On Tuesday, October 23, 2012 10:13:26 AM UTC+11, Cholhee Jung
wrote:
>>>
>>>
>>>
>>> Dear users,
>>>
>>> I was trying ComBat on ~1,000 samples.
>>> Samples are spread over 12 batches and each batch contains 4
technical
>>> replicates that are identical across all batches.
>>> The number of covariates is 5, and I was using the ComBat
implemented in
>>> the 'sva' package.
>>>
>>> I tried ComBat with two model matrix built from the same covariate
>>> information.
>>>
>>> First model matrix was constructed as below:
>>>> mod_mat = model.matrix(~as.factor(cov1) + as.factor(cov2) +
>>> as.factor(cov3) + as.factor(cov4) + as.factor(cov5),
data=pheno_data )
>>>
>>> Second one was built as below:
>>>> mod_mat = model.matrix(~as.factor(paste(pheno_data$cov1,
>>> pheno_data$cov2, pheno_data$cov3, pheno_data$cov4,
pheno_data$cov5,
>>> sep=":")))
>>>
>>> Basically, covariates were concatenated into one string for the
the
>>> second model matrix.
>>>
>>> ComBat with the first model matrix raised the 'singular' error
like
>> below:
>>>
>>> Error in solve.default(t(design) %*% design) :
>>> Lapack routine dgesv: system is exactly singular
>>>
>>> But, ComBat run without error with the second model matrix.
>>>
>>>
>>> Now I wonder if the two different model matrices are same?
>>>
>>> Regards,
>>> Chol-hee
>>>
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
[[alternative HTML version deleted]]