Entering edit mode
Guan
▴
20
@guan-6520
Last seen 10.3 years ago
Johnson, William Evan <wej at="" ...=""> writes:
>
> ComBat should be done after normalization, and only of there are
clear
signs of batch effects after
> normalization (either through significance testing, clustering, or
principle component analysis).
>
> On Aug 21, 2013, at 12:33 AM, amit kumar subudhi wrote:
>
> Hello Dr. Evan,
>
> One more doubt, hopefully you will answer it. Is it recommended that
before doing ComBat, required
> normalization on the data should be carried out or after ComBat we
can do
the normalization step? This
> particular question making me confused. Please answer to this
question if
you can.
>
> With best regards
> Amit
>
> On Mon, Aug 19, 2013 at 7:12 PM, amit kumar subudhi
> <amit4help at="" ...<mailto:amit4help="" at="" ...="">> wrote:
> This reply solved my problem. Thanks again Dr. Evan for your kind
and
prompt reply and suggestions.
>
> Regards
> Amit
>
> On Mon, Aug 19, 2013 at 7:08 PM, Johnson, William Evan
> <wej at="" ...<mailto:wej="" at="" ...="">> wrote:
> Yes, it should be fine to remove batch effects on the larger dataset
and
then use a smaller subset to do your
> comparisons. In fact, this approach might even be preferred even if
it
were possible to adjust for batch in
> the smaller subset.
>
> On Aug 19, 2013, at 9:34 AM, amit kumar subudhi wrote:
>
> Thanks again for the reply Dr. Evans,
>
> This set of samples is a subset from a larger set and contain many
more
samples in each batch. When I have
> performed the ComBat on the larger dataset I could able remove the
batch
effects to some extend. To Inform
> you, the known batch effect here is the different dates of
hybridization
and a simple hierarchical
> analysis showed that most of the samples are clustering based on the
date
of hybridization and hence tried
> the ComBat to remove the batch effects. The third batch contains
most of
the uncomplicated malaria
> samples. The subset of samples that I have posted here contains
specific
symptoms pertaining to severe
> malaria and hence selected for comparison with uncomplicated malaria
samples.
>
> Question- As I have mentioned above, I have applied the ComBat to
remove
the batch effects from the larger
> data set, can I take the smaller set of samples from the larger data
set
to find out deferentially regulated
> genes? Answer to this question would really be helpful.
>
> With best regards
> Amit
>
> On Mon, Aug 19, 2013 at 6:31 PM, Johnson, William Evan
> <wej at="" ...<mailto:wej="" at="" ...="">> wrote:
> Okay, yes this is clear now. Your batch and covariate status are
completely confounded. In other words, if
> you see a difference between "severe" and "uncomplicated" you won't
know
if this is really due to a
> covariate effect or if this is due to a batch (batch 3) effect. In
short,
this is really an experimental
> design issue and ComBat cannot help you.
>
> If you were to remove the "malaria" covariate, then ComBat would
work, but
it would also take out all malaria
> covariate effects as well. How bad are the batch effects between
batches 1
and 2? Do you expect batch 3 to
> have a similar level of batch differences? You could combine batches
1 and
2, and then look for differences
> with batch 3--but you wouldn't know whether the differential
expression is
due to the treatment or due to
> batch--hence the confounding...
>
> Sorry I couldn't be much more of a help, but like I said, the issue
here
is due to experimental design.
>
> Evan
>
> On Aug 19, 2013, at 8:55 AM, amit kumar subudhi wrote:
>
> Hello Dr. Evan,
>
> Thanks for the prompt reply. Below is the whole pheno table. Looking
at
the whole table might give you an idea
> about the probable cause of the error. Batch 1 and 2 contains only
severe
malaria samples where as batch 2
> contains uncomplicated malaria samples.
> sample batch malaria
> AL 1 1 Severe
> AO 2 1 Severe
> AQ 3 1 Severe
> AP 4 1 Severe
> CF 5 2 Severe
> CL 6 2 Severe
> CU 7 2 Severe
> CV 8 2 Severe
> GA_UC 9 3 uncomplicated
> GB_UC 10 3 uncomplicated
> GC_UC 11 3 uncomplicated
> GE_UC 12 3 uncomplicated
> GR_UC 13 3 uncomplicated
>
> With best regards
>
> On Mon, Aug 19, 2013 at 5:50 PM, Johnson, William Evan
> <wej at="" ...<mailto:wej="" at="" ...="">> wrote:
> Amit,
>
> The "singularity" error you are getting occurs when your covariates
are
confounded with batch (or with
> each other). In the example you are trying is there a batch that
contains
only one covariate level and is
> that covariate level exclusive to the batch? If this does not make
sense,
post your 'pheno' variable in a
> reply and I will be happy to help you figure out the problem.
>
> Evan
>
> On Aug 19, 2013, at 6:00 AM, <bioconductor-request at="" ...="" <mailto:bioconductor-request="" at="" ...="">>
>
> <bioconductor-request at="" ...<mailto:bioconductor-request="" at="" ...="">>
wrote:
>
> > Date: Sun, 18 Aug 2013 19:58:35 +0530
> > From: amit kumar subudhi <amit4help at="" ...<mailto:amit4help="" at="" ...="">>
> > To: bioconductor at ...<mailto:bioconductor at="" ...="">
> > Subject: [BioC] ComBat_ Error in solve.default(t(design) %*%
design) :
> > Lapack routine dgesv: system is exactly singular: U[4, 4] =
0
> > Message-ID:
> > <cadxjrxwkyc3provl3rnmyc03qpyvh_vdvxvzymu-wkvmw+nkiw at="" ...="" <mailto:cadxjrxwkyc3provl3rnmyc03qpyvh_vdvxvzymu-wkvmw%2bnkiw="" at="" ...="">>
> > Content-Type: text/plain
> >
> > Hello to all ComBat users,
> >
> > I am trying to remove the batch effects from some of my microarray
data
but
> > at last I am getting an error message which read as
> >
> > Found 3 batches
> > Found 1 categorical covariate(s)
> > Standardizing Data across genes
> > Error in solve.default(t(design) %*% design) :
> > Lapack routine dgesv: system is exactly singular: U[4,4] = 0
> >
> > The head(edata) looks like this
> > AL AO AP AQ
CF
> > GT_pfalci_specific_0000001 16.053898 16.080540 16.101114 16.046898
16.087206
> > GT_pfalci_specific_0000002 10.051407 10.477143 8.369233 10.657850
13.312936
> > GT_pfalci_specific_0000003 8.910620 8.683393 7.812817 8.496099
10.920685
> > GT_pfalci_specific_0000004 6.603195 8.993232 6.476777 6.792369
3.319346
> > GT_pfalci_specific_0000005 9.813562 11.084574 9.055613 11.568550
12.977261
> > GT_pfalci_specific_0000006 15.989252 15.993513 15.963054 16.000675
15.983985
> > CL CU CV GA_UC
GB_UC
> > GT_pfalci_specific_0000001 16.082037 16.071299 16.090370 15.971335
15.994304
> > GT_pfalci_specific_0000002 12.653076 9.703247 8.827624 5.697412
8.060719
> > GT_pfalci_specific_0000003 11.470758 10.548943 10.718349 6.132614
8.007271
> > GT_pfalci_specific_0000004 5.328515 8.398546 6.351136 3.045112
3.891578
> > GT_pfalci_specific_0000005 8.520699 11.791610 11.535907 6.791468
9.930246
> > GT_pfalci_specific_0000006 15.980660 15.984256 15.970124 13.353012
13.740395
> > GC_UC GE_UC GR_UC
> > GT_pfalci_specific_0000001 15.855644 16.090246 16.086956
> > GT_pfalci_specific_0000002 9.026398 8.015609 7.814614
> > GT_pfalci_specific_0000003 5.341252 8.658231 5.788790
> > GT_pfalci_specific_0000004 4.191565 3.040515 3.517175
> > GT_pfalci_specific_0000005 5.446910 11.982848 5.477334
> > GT_pfalci_specific_0000006 11.872469 13.675290 13.117105
> >
> > GT_pfalci_specific_0000006 15.983985 15.970124
> >
> > and the head(pheno) looks like this
> > sample batch malaria
> > AL 1 1 severe
> > AO 2 1 severe
> > AP 3 1 severe
> > AQ 4 1 severe
> > CF 5 2 severe
> > CL 6 2 severe
> >
> >
> > the commands that I have used for ComBat is
> > mod = model.matrix(~as.factor(malaria), data=pheno)
> > combat_edata = ComBat(dat=edata, batch=batch, mod=mod,
numCovs=NULL,
> > par.prior=TRUE, prior.plots=FALSE)
> >
> > head(mod) looks like this
> > (Intercept) as.factor(malaria)uncomplicated
> > AL 1 0
> > AO 1 0
> > AP 1 0
> > AQ 1 0
> > CF 1 0
> > CL 1 0
> >
> > Why I am getting this error meassage? Please help me out. When I
am
taking
> > the larger sample size (n=33) I could able to remove the batch
effects
but
> > a subset of those samples giving me the above problem.
> >
> >
> > --
> > Amit Kumar Subudhi
> > Research Scholar,
> > CSIR-Senior Research Fellow,
> > Molecular Parasitology and Systems Biology Lab,
> > Department of Biological Sciences ,
> > FD III, BITS, Pilani,
> > Rajasthan- 333031
> > e mail-
> > amit4help at ...<mailto:amit4help at="" ...="">
> > amit.subudhi at ...<mailto:amit.subudhi at="" ...="">
> > Mob No- 919983525845
>
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at="" ...="">
> amit.subudhi at ...<mailto:amit.subudhi at="" ...="">
> Mob No- 919983525845
>
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at="" ...="">
> amit.subudhi at ...<mailto:amit.subudhi at="" ...="">
> Mob No- 919983525845
>
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at="" ...="">
> amit.subudhi at ...<mailto:amit.subudhi at="" ...="">
> Mob No- 919983525845
>
> --
> Amit Kumar Subudhi
> Research Scholar,
> CSIR-Senior Research Fellow,
> Molecular Parasitology and Systems Biology Lab,
> Department of Biological Sciences ,
> FD III, BITS, Pilani,
> Rajasthan- 333031
> e mail-
> amit4help at ...<mailto:amit4help at="" ...="">
> amit.subudhi at ...<mailto:amit.subudhi at="" ...="">
> Mob No- 919983525845
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi Evan and Amit, or others who may help,
I had the same ComBat_Error appeared when running surrogate variable
analysis (SVA). I understood from the post that this error is to do
with the
confounded batch and covariate status. I have several other related
questions. Hope you could have a look. Many thanks for any
opinions/suggestions.
Data set: 24 samples from 6 subjects (4 time points/subject: 2
baseline
samples collected on different days, 1 during drug treatment, 1 after
drug
treatment). Experiments were done with Affymetrix GeneChip 3.0 for
miRNA
expression profiling.
Initial data analysis: "oligo" is used to handle Affy CEL files,
"rma()" is
used for data normalization. After this, I still see PC1 seems to
correlate
with certain batch effect (which I'm not aware, i.e. not come from
different
scan dates) on the PCA plot. Then "sva" package is used to estimate
the
surrogate variables, followed by "ComBat()".
Now, come to the ComBat_Error, when I specified the contrasts as
(Base2-
Base1, During-Base1, Post-Base1). The pheno input attached below:
sample batch Status
GW2miRNA1_(miRNA-3_0).CEL 1 1 Base1
GW2miRNA2_(miRNA-3_0).CEL 1 1 Post7
GW2miRNA3_(miRNA-3_0).CEL 2 1 Base1
GW2miRNA4_(miRNA-3_0).CEL 2 1 Post7
GW2miRNA5_(miRNA-3_0).CEL 3 1 Base1
GW2miRNA6_(miRNA-3_0).CEL 3 1 Post7
GW2miRNA7_(miRNA-3_0).CEL 4 1 Base1
GW2miRNA8_(miRNA-3_0).CEL 4 1 Post7
GW2miRNA9_(miRNA-3_0).CEL 5 1 Base1
GW2miRNA10_(miRNA-3_0).CEL 5 1 Post7
GW2miRNA11_(miRNA-3_0).CEL 6 1 Base1
GW2miRNA12_(miRNA-3_0).CEL 6 1 Post7
GW1miRNA13_(miRNA-3_0).CEL 6 2 Base2
GW1miRNA14_(miRNA-3_0).CEL 6 2 During4
GW1miRNA15_(miRNA-3_0).CEL 4 2 Base2
GW1miRNA16_(miRNA-3_0).CEL 1 2 During4
GW1miRNA17_(miRNA-3_0).CEL 5 2 Base2
GW1miRNA18_(miRNA-3_0).CEL 5 2 During4
GW1miRNA19_(miRNA-3_0).CEL 4 2 During4
GW1miRNA20_(miRNA-3_0).CEL 3 2 Base2
GW1miRNA21_(miRNA-3_0).CEL 3 2 During4
GW1miRNA22_(miRNA-3_0).CEL 1 2 Base2
GW1miRNA23_(miRNA-3_0).CEL 2 3 During4
GW1miRNA24_(miRNA-3_0).CEL 2 3 Base2
I could understand from the post below that the reason is that the
batch is
confounded with the status as you could see in the phenotype file.
Since the
two baseline samples are from same subjects, however, collected on
different
days before injecting the drug. I'm thinking whether it makes sense to
classify "Base1 + Base2" as "Base", and make contrasts for "During -
Base"
and "Post - Base". Other columns in above pheno file will be kept the
same
and re-run the "sva"? Or is it more appropriate to do two separate
"sva"
analyses, i.e. "Post7 - Base1" for first 12 samples as hybridized and
scanned at the same time and "During4 - Base2" for the last 12 samples
as
they were treated as a batch (however, scanned at two times, that's
why they
were labelled as batch 2 and 3 in column of "batch").
Hope I've described clearly. Much appreciated suggestions/opinions.
Regards
Guan