Entering edit mode
Dear Eugene,
On Sat, 21 Dec 2013, Eugene Bolotin wrote:
> Dear Gordon,
> I apologize if I was a bit unclear, I actually simplified my problem
a
> little bit for the post so it would fit into this bio-conductor
post. I
> actually have 10+ samples in batch 1157, but that batch does not
contain
> any "tumor" samples. I have additional similar batches some with
tumor
> "some" without "tumor" samples. I want to remove batch specific
differences
> between all samples. edgeR however gives me the same error, no
matter how
> many samples I have in the batch, but does not give me this error if
I
> remove all batches which do not contain any "tumor" samples.
If the problem that you posted was not your real problem, then please
post
your real problem.
But let me say that it will never be possible to estimate a batch
effect
for a group of samples every one of which it also has its own unique
treatment condition, regardless of how many of these there are. To do
so,
would be to estimate n+1 parameters from n observations. It is a
universal rule of statistics that you cannot estimate more unknown
parameters than you have observations.
> Can I just take residuals of logged count data after performing the
> linear regression on the batch factor? Can I then then feed the
> residuals into edgeR linear modeling? I want to compare how much
each
> sample/patient/vector differs from average "tumor" sample. The
batches
> are quite large with >10 samples each, and I have ~300 total
samples.
No you can't. edgeR only complains that the problem is non-estimable
when
it is truly impossible to estimate all the parameters. Impossible
means
impossible. If it was possible to work around by an ad hoc method
such as
you describe, then edgeR would have already done that.
Best wishes
Gordon
> Thanks a ton,
> Eugene
>
>
>
> On Sat, Dec 21, 2013 at 3:31 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>
>> Dear Eugene,
>>
>> According to your design, Sample 31 is a unique treatment unto
itself, and
>> also a unique batch unto itself. Obviously it is impossible to
estimate
>> both the batch effect and the treatment effect from one sample.
Hence the
>> error message.
>>
>> Best wishes
>> Gordon
>>
>> Date: Fri, 20 Dec 2013 16:49:43 -0800 (PST)
>>> From: "Eugene Bolotin [guest]" <guest at="" bioconductor.org="">
>>> To: bioconductor at r-project.org, elbolotin at gmail.com
>>> Subject: [BioC] EdgeR Design matrix not of full rank. The
following
>>> coefficients not estimable erroR
>>>
>>>
>>> Hi I have the following samples:
>>> batch
>>> [1] 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 2055
1802 1802
>>> 2055
>>> [16] 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055
2055
>>> 1802 1802
>>> [31] 1157 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802
1802
>>> 2055 2055
>>> [46] 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055
>>> Levels: 1157 1802 2055
>>> treatment
>>> [1] TCGA-BR-6452 TCGA-BR-6453 tumor TCGA-BR-6454 tumor
>>> [6] TCGA-BR-6455 TCGA-BR-6456 TCGA-BR-6457 tumor TCGA-
BR-6458
>>> [11] tumor TCGA-BR-6563 TCGA-BR-6565 TCGA-BR-6566 TCGA-
BR-7196
>>> [16] TCGA-BR-7703 tumor TCGA-BR-7704 tumor TCGA-
BR-7707
>>> [21] TCGA-BR-7715 tumor TCGA-BR-7716 tumor TCGA-
BR-7717
>>> [26] tumor TCGA-BR-7723 TCGA-CD-5804 TCGA-CG-4437 TCGA-
CG-4441
>>> [31] TCGA-CG-4476 TCGA-CG-5716 TCGA-D7-6518 TCGA-D7-6519
TCGA-D7-6520
>>> [36] TCGA-D7-6521 TCGA-D7-6522 TCGA-D7-6524 TCGA-D7-6525
TCGA-D7-6526
>>> [41] TCGA-D7-6527 TCGA-D7-6528 TCGA-F1-6177 TCGA-F1-6875 TCGA-
FP-7735
>>> [46] tumor TCGA-FP-7829 tumor TCGA-HF-7131 TCGA-
HF-7132
>>> [51] TCGA-HF-7133 TCGA-HF-7134 TCGA-HF-7136 TCGA-IN-7806 tumor
>>> 44 Levels: TCGA-BR-6452 TCGA-BR-6453 TCGA-BR-6454 TCGA-BR-6455 ...
tumor
>>>
>>>
>>>
>>>
>>>
>>> I want to compare each sample from TCGA_X, to average mutant
background,
>>> I know it is possible, because I was able to do it using standard
commands.
>>> However, when I try to adjust for batch effects as follows:
>>> design=model.matrix(~batch+treatment)
>>> names(data.frame(design))
>>> group=treatment
>>> y=readDGE(files, path=wd, columns=c(1,2), group=group)
>>> #names(data.frame(design))
>>> design=model.matrix(~0+batch+treatment)
>>>
>>> names(data.frame(design))
>>> #rownames(design)=colnames(y)
>>> design
>>>
>>> y = estimateGLMCommonDisp(y, design, verbose=TRUE)
>>>>
>>> Error in glmFit.default(y, design = design, dispersion =
dispersion,
>>> offset = offset, :
>>> Design matrix not of full rank. The following coefficients not
>>> estimable:
>>> treatmentTCGA-CG-4476
>>> as far as i can tell it is because the batch 1157 contains a
normal
>>> sample but does not contain any tumor samples.
>>> Is there a way around that?
>>> Thanks,
>>> Eugene
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> sessionInfo()
>>>>
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
base
>>>
>>> other attached packages:
>>> [1] edgeR_3.4.2 limma_3.18.6
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_3.0.2
>>>
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>>
______________________________________________________________________
>>
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}