Hi list,
I've been looking at 3*44k and 2*244k agilent CGH arrays. To date I've
used limma to read in the processed signals (no background correction
or normalization as this has been done), then the DNAcopy package for
segmentation as well as the snapCGH package to employ other
segmentation methods rather than use each segmentation package
individually.
Firstly using the DNAcopy segmentation I can see a significant pattern
across my 3*44k arrays which disappears when I perform the step to
remove unnecessary change points due to trends in the data. As these
are in the same locations across the 3 arrays then is it likely that
this is biologically significant rather than being a trend? Obviously
others do not have a definitive answer for this but I wondered if
anyone had seen similar results in a different scenario.
Additionally I'm wondering what segmentation methods people have
tended
to employ. The heterogeneous nature of my data means that I need to
identify single probe as well as larger region aberrations and I'd
read that the CBS algorithm is not particular suited to doing this?
Apologies if this is a bit vague.
Thanks for any input,
John
jhs1jjm at leeds.ac.uk wrote:
> Hi list,
>
> I've been looking at 3*44k and 2*244k agilent CGH arrays. To date
I've
> used limma to read in the processed signals (no background
correction
> or normalization as this has been done), then the DNAcopy package
for
> segmentation as well as the snapCGH package to employ other
> segmentation methods rather than use each segmentation package
> individually.
>
> Firstly using the DNAcopy segmentation I can see a significant
pattern
> across my 3*44k arrays which disappears when I perform the step to
> remove unnecessary change points due to trends in the data. As these
> are in the same locations across the 3 arrays then is it likely that
> this is biologically significant rather than being a trend?
Obviously
> others do not have a definitive answer for this but I wondered if
> anyone had seen similar results in a different scenario.
What you are describing could be technical in nature or
copy-number-variants. You will probably need to review those regions
for known copy-number-variants and also look at the quality control
metrics for those probes. Unfortunately, segmentation is not "the
final
answer" to CGH analysis--there has to be some curation (either manual
or
automated) to find the regions of greatest interest and remove the
regions that are likely not associated with the disease state.
> Additionally I'm wondering what segmentation methods people have
tended
> to employ. The heterogeneous nature of my data means that I need to
> identify single probe as well as larger region aberrations and I'd
> read that the CBS algorithm is not particular suited to doing this?
> Apologies if this is a bit vague.
Single probes are problematic and require validation using another
technology or array platform, in my opinion.
Hi Sean,
As its 2 colour so I'm looking at relative amounts wouldn't that mean
I
wouldn't see copy number variants, would they not be in both my
samples? I was also pondering the advantages of using R and
bioconductor, vs say Agilent's z score, for the purposes of my
discussion. Is the simple answer simply a flexible approach to these
matters? Also if possible could you expand a bit in regards to the
single probes argument.
Thanks for the input
John
Quoting Sean Davis <sdavis2 at="" mail.nih.gov=""> on Wed 10 Oct 2007
15:36:21
BST:
> jhs1jjm at leeds.ac.uk wrote:
> > Hi list,
> >
> > I've been looking at 3*44k and 2*244k agilent CGH arrays. To date
> I've
> > used limma to read in the processed signals (no background
> correction
> > or normalization as this has been done), then the DNAcopy package
> for
> > segmentation as well as the snapCGH package to employ other
> > segmentation methods rather than use each segmentation package
> > individually.
> >
> > Firstly using the DNAcopy segmentation I can see a significant
> pattern
> > across my 3*44k arrays which disappears when I perform the step to
> > remove unnecessary change points due to trends in the data. As
> these
> > are in the same locations across the 3 arrays then is it likely
> that
> > this is biologically significant rather than being a trend?
> Obviously
> > others do not have a definitive answer for this but I wondered if
> > anyone had seen similar results in a different scenario.
>
> What you are describing could be technical in nature or
> copy-number-variants. You will probably need to review those
regions
> for known copy-number-variants and also look at the quality control
> metrics for those probes. Unfortunately, segmentation is not "the
> final
> answer" to CGH analysis--there has to be some curation (either
manual
> or
> automated) to find the regions of greatest interest and remove the
> regions that are likely not associated with the disease state.
>
> > Additionally I'm wondering what segmentation methods people have
> tended
> > to employ. The heterogeneous nature of my data means that I need
to
> > identify single probe as well as larger region aberrations and
I'd
> > read that the CBS algorithm is not particular suited to doing
this?
> > Apologies if this is a bit vague.
>
> Single probes are problematic and require validation using another
> technology or array platform, in my opinion.
>
jhs1jjm at leeds.ac.uk wrote:
> Hi Sean,
>
> As its 2 colour so I'm looking at relative amounts wouldn't that
mean I
> wouldn't see copy number variants, would they not be in both my
> samples? I was also pondering the advantages of using R and
> bioconductor, vs say Agilent's z score, for the purposes of my
> discussion. Is the simple answer simply a flexible approach to these
> matters? Also if possible could you expand a bit in regards to the
> single probes argument.
If using Agilent CGHAnalytics, you will probably want to use ADM-1,
not
z-score. For the 44k arrays, a threshold of around 6 is probably
appropriate. For the 244k arrays, something closer to 10 or 11 is
more
appropriate. ADM-1 is exquisitely sensitive to single probes that are
extreme values. These may represent real signal, or may be noise.
There is no way to tell without validation, in my opinion. However,
If
there are two or more probes behaving similarly, then you can be more
assured of real biology. The real biology could be directly
disease-related or not. The ones that are not are copy number
variants
(although there is now plenty of evidence that copy number variants
can
be disease-associated, as well). When using high-resolution oligo
arrays, you will need to become familiar with copy number polymorphism
and databases for annotating them. CGHAnalytics contains a catalog of
those built-in.
As for R/Bioc versus commercial packages, that will be dictated by the
questions you want to ask. We find that we routinely need and want to
ask questions that are not easily answered by commercial packages.
That
said, a good visualization tool for CGH is HIGHLY useful, and there
are
now several available.
Sean
jhs1jjm at leeds.ac.uk wrote:
> Hi Sean,
>
> As its 2 colour so I'm looking at relative amounts wouldn't that
mean I
> wouldn't see copy number variants, would they not be in both my
> samples?
I forgot to answer this question, directly. If the reference genome
and
the test genome contain the same number of copies of a CNV region, you
will not see it, as you suggest. However, if your reference and test
samples contain different numbers of copies, then this will
potentially
be evident in your data.
Sean
Dear John,
On Wednesday 10 October 2007 15:52, jhs1jjm at leeds.ac.uk wrote:
> Hi list,
>
> I've been looking at 3*44k and 2*244k agilent CGH arrays. To date
I've
> used limma to read in the processed signals (no background
correction
> or normalization as this has been done), then the DNAcopy package
for
> segmentation as well as the snapCGH package to employ other
> segmentation methods rather than use each segmentation package
> individually.
>
> Firstly using the DNAcopy segmentation I can see a significant
pattern
> across my 3*44k arrays which disappears when I perform the step to
> remove unnecessary change points due to trends in the data. As these
How exactly are you removing "unnecesary change points due to trends
in the
data"?
> are in the same locations across the 3 arrays then is it likely that
> this is biologically significant rather than being a trend?
Obviously
> others do not have a definitive answer for this but I wondered if
> anyone had seen similar results in a different scenario.
>
> Additionally I'm wondering what segmentation methods people have
tended
> to employ. The heterogeneous nature of my data means that I need to
> identify single probe as well as larger region aberrations and I'd
> read that the CBS algorithm is not particular suited to doing this?
If you run the "smooth.CNA" function (in the DNAcopy package), as it
is
recommended in the documentation for DNAcopy (IIRC), then single probe
aberrations are not detectable (you are smoothing them away).
Single probe aberrations might be detected with the HMM model in
snapCGH or
our HMM model in RJaCGH, available from CRAN
(http://cran.r-project.org/src/contrib/Descriptions/RJaCGH.html).
(Details of
the method available from the paper:
http://compbiol.plosjournals.org/perlserv/?request=get-
document&doi=10.1371%2Fjournal.pcbi.0030122).
Best,
R.
> Apologies if this is a bit vague.
>
> Thanks for any input,
>
> John
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Ram?n D?az-Uriarte
Statistical Computing Team
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y
...{{dropped:3}}
Hi Ramon,
Ah, of course, I'd forgotten I'd performed that step. I'm still
getting
some segments with high means corresponding to single genes but
this'll
be because they are represented by more than 1 probe I guess. The
DNAcopy document has a step in it to remove local trends in the data.
I'm undoing splits that are not at least 3 SDs apart as set out in the
document.
To summarize then,I might use DNA copy to identify regions but in
order
to look at single probe aberrations I'd want to use one of the other
methods i.e HMM
Thanks
John
Quoting Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> on Wed 10 Oct 2007
15:22:22
BST:
> Dear John,
>
> On Wednesday 10 October 2007 15:52, jhs1jjm at leeds.ac.uk wrote:
> > Hi list,
> >
> > I've been looking at 3*44k and 2*244k agilent CGH arrays. To date
> I've
> > used limma to read in the processed signals (no background
> correction
> > or normalization as this has been done), then the DNAcopy package
> for
> > segmentation as well as the snapCGH package to employ other
> > segmentation methods rather than use each segmentation package
> > individually.
> >
> > Firstly using the DNAcopy segmentation I can see a significant
> pattern
> > across my 3*44k arrays which disappears when I perform the step to
> > remove unnecessary change points due to trends in the data. As
> these
>
> How exactly are you removing "unnecesary change points due to trends
> in the
> data"?
>
>
> > are in the same locations across the 3 arrays then is it likely
> that
> > this is biologically significant rather than being a trend?
> Obviously
> > others do not have a definitive answer for this but I wondered if
> > anyone had seen similar results in a different scenario.
> >
> > Additionally I'm wondering what segmentation methods people have
> tended
> > to employ. The heterogeneous nature of my data means that I need
to
> > identify single probe as well as larger region aberrations and
I'd
> > read that the CBS algorithm is not particular suited to doing
this?
>
> If you run the "smooth.CNA" function (in the DNAcopy package), as it
> is
> recommended in the documentation for DNAcopy (IIRC), then single
> probe
> aberrations are not detectable (you are smoothing them away).
>
> Single probe aberrations might be detected with the HMM model in
> snapCGH or
> our HMM model in RJaCGH, available from CRAN
> (http://cran.r-project.org/src/contrib/Descriptions/RJaCGH.html).
> (Details of
> the method available from the paper:
>
http://compbiol.plosjournals.org/perlserv/?request=get-
document&doi=10.1371%2Fjournal.pcbi.0030122).
>
>
> Best,
>
> R.
>
>
> > Apologies if this is a bit vague.
> >
> > Thanks for any input,
> >
> > John
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Ram?n D?az-Uriarte
> Statistical Computing Team
> Centro Nacional de Investigaciones Oncol?gicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fern?ndez Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
>
> http://ligarto.org/rdiaz
> PGP KeyID: 0xE89B3462
> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>
>
>
> **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso
> los ficheros adjuntos, pueden contener informaci?n protegida para el
> uso exclusivo de su destinatario. Se proh?be la distribuci?n,
> reproducci?n o cualquier otro tipo de transmisi?n por parte de otra
> persona que no sea el destinatario. Si usted recibe por error este
> correo, se ruega comunicarlo al remitente y borrar el mensaje
> recibido.
> **CONFIDENTIALITY NOTICE** This email communication and any
> attachments may contain confidential and privileged information for
> the sole use of the designated recipient named above. Distribution,
> reproduction or any other use of this transmission by any party
other
> than the intended recipient is prohibited. If you are not the
> intended recipient please contact the sender and delete all copies.
>
>
On Wednesday 10 October 2007 17:04, jhs1jjm at leeds.ac.uk wrote:
> Hi Ramon,
>
> Ah, of course, I'd forgotten I'd performed that step. I'm still
getting
> some segments with high means corresponding to single genes but
this'll
> be because they are represented by more than 1 probe I guess. The
> DNAcopy document has a step in it to remove local trends in the
data.
> I'm undoing splits that are not at least 3 SDs apart as set out in
the
> document.
>
Ah, OK. I thought you were referring to other trends (I've heard
people
mention waves, and relations to CG content, etc ---the later, I think,
commonly done in Affy).
> To summarize then,I might use DNA copy to identify regions but in
order
> to look at single probe aberrations I'd want to use one of the other
> methods i.e HMM
>
We often analyze data with four or five different methods (our own HMM
in
RJaCGH, Olshen's CBS, HMM as in Marioni et al., Piccard's et al
CGHseg, and
Hsu et al. wavelet-based smoothing) because different approaches are
sensitive to different features of the data (or can be misled by
different
features of the data). (Of course, we do think our approach is the
best
overall performer, but this way we can keep learning about relative
strengths
of different methods and/or detect bugs in the code).
Detecting single point aberrations might be trickier than, say,
detecting a
long alteration that involves tens of probes. But then, inability to
detect
single gene alterations can be very relevant in some studies (e.g.,
IIRC,
Aguirre et al., in PNAS 2004, in their study of pancreatic
adenocarcinoma,
have some discussion not detecting the loss of the tumor supressor
SMAD4).
As for the need for validation, etc, if you have a gene covered by a
bunch of
probes and only a single probe is being called aberrant then I'd be
more
concerned; but you might be averaging over probes, or use platforms
where
some genes only have a probe, etc. In general, many/most of the
current aCGH
studies are really exploratory studies (i.e., they are in the "copy
number
differences discovery" stage, not "copy number association studies"
stage)
with results that need to be validated further (other aCGH platforms,
other
molecular techniques); there are several papers in the July 2007 issue
of
Nature Genetics (volume 39) that go into these issues.
Best,
R.
> Thanks
> John
>
> Quoting Ramon Diaz-Uriarte <rdiaz at="" cnio.es=""> on Wed 10 Oct 2007
15:22:22
>
> BST:
> > Dear John,
> >
> > On Wednesday 10 October 2007 15:52, jhs1jjm at leeds.ac.uk wrote:
> > > Hi list,
> > >
> > > I've been looking at 3*44k and 2*244k agilent CGH arrays. To
date
> >
> > I've
> >
> > > used limma to read in the processed signals (no background
> >
> > correction
> >
> > > or normalization as this has been done), then the DNAcopy
package
> >
> > for
> >
> > > segmentation as well as the snapCGH package to employ other
> > > segmentation methods rather than use each segmentation package
> > > individually.
> > >
> > > Firstly using the DNAcopy segmentation I can see a significant
> >
> > pattern
> >
> > > across my 3*44k arrays which disappears when I perform the step
to
> > > remove unnecessary change points due to trends in the data. As
> >
> > these
> >
> > How exactly are you removing "unnecesary change points due to
trends
> > in the
> > data"?
> >
> > > are in the same locations across the 3 arrays then is it likely
> >
> > that
> >
> > > this is biologically significant rather than being a trend?
> >
> > Obviously
> >
> > > others do not have a definitive answer for this but I wondered
if
> > > anyone had seen similar results in a different scenario.
> > >
> > > Additionally I'm wondering what segmentation methods people have
> >
> > tended
> >
> > > to employ. The heterogeneous nature of my data means that I need
to
> > > identify single probe as well as larger region aberrations and
I'd
> > > read that the CBS algorithm is not particular suited to doing
this?
> >
> > If you run the "smooth.CNA" function (in the DNAcopy package), as
it
> > is
> > recommended in the documentation for DNAcopy (IIRC), then single
> > probe
> > aberrations are not detectable (you are smoothing them away).
> >
> > Single probe aberrations might be detected with the HMM model in
> > snapCGH or
> > our HMM model in RJaCGH, available from CRAN
> > (http://cran.r-project.org/src/contrib/Descriptions/RJaCGH.html).
> > (Details of
> > the method available from the paper:
>
> http://compbiol.plosjournals.org/perlserv/?request=get-
document&doi=10.1371
>%2Fjournal.pcbi.0030122).
>
> > Best,
> >
> > R.
> >
> > > Apologies if this is a bit vague.
> > >
> > > Thanks for any input,
> > >
> > > John
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > >
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > Ram?n D?az-Uriarte
> > Statistical Computing Team
> > Centro Nacional de Investigaciones Oncol?gicas (CNIO)
> > (Spanish National Cancer Center)
> > Melchor Fern?ndez Almagro, 3
> > 28029 Madrid (Spain)
> > Fax: +-34-91-224-6972
> > Phone: +-34-91-224-6900
> >
> > http://ligarto.org/rdiaz
> > PGP KeyID: 0xE89B3462
> > (http://ligarto.org/rdiaz/0xE89B3462.asc)
> >
> >
> >
> > **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso
> > los ficheros adjuntos, pueden contener informaci?n protegida para
el
> > uso exclusivo de su destinatario. Se proh?be la distribuci?n,
> > reproducci?n o cualquier otro tipo de transmisi?n por parte de
otra
> > persona que no sea el destinatario. Si usted recibe por error este
> > correo, se ruega comunicarlo al remitente y borrar el mensaje
> > recibido.
> > **CONFIDENTIALITY NOTICE** This email communication and any
> > attachments may contain confidential and privileged information
for
> > the sole use of the designated recipient named above.
Distribution,
> > reproduction or any other use of this transmission by any party
other
> > than the intended recipient is prohibited. If you are not the
> > intended recipient please contact the sender and delete all
copies.
--
Ram?n D?az-Uriarte
Statistical Computing Team
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y
...{{dropped:3}}
Sean,
Thanks, that helps a lot. I've purposely stayed away from using the
Agilent software, one as I'm not on campus (this is where it is) and
secondly I wanted to do the analysis using R and bioconductor and any
other open source software I can get my hands on. I was also wondering
whether its the case that a lot of the packages and the algorithms
they
use are found in bioconductor/R first and it may take time to
implement
them on commercial platform i.e with a nice GUI?
I wonder also if you could help on another matter. At the moment I'm
exporting the DNAcopy segment output as csv file then opening it in
open office calc and correlating the map position with the agilent
text
file to find the corresponding genes. This is fine for the 44k arrays
but I'm unable to see all the rows for the 244k text file in calc so
cannot correlate the map position with genes.
Regards
John
Quoting Sean Davis <sdavis2 at="" mail.nih.gov=""> on Wed 10 Oct 2007
17:15:52
BST:
> jhs1jjm at leeds.ac.uk wrote:
> > Hi Sean,
> >
> > As its 2 colour so I'm looking at relative amounts wouldn't that
> mean I
> > wouldn't see copy number variants, would they not be in both my
> > samples? I was also pondering the advantages of using R and
> > bioconductor, vs say Agilent's z score, for the purposes of my
> > discussion. Is the simple answer simply a flexible approach to
> these
> > matters? Also if possible could you expand a bit in regards to the
> > single probes argument.
>
> If using Agilent CGHAnalytics, you will probably want to use ADM-1,
> not
> z-score. For the 44k arrays, a threshold of around 6 is probably
> appropriate. For the 244k arrays, something closer to 10 or 11 is
> more
> appropriate. ADM-1 is exquisitely sensitive to single probes that
> are
> extreme values. These may represent real signal, or may be noise.
> There is no way to tell without validation, in my opinion. However,
> If
> there are two or more probes behaving similarly, then you can be
more
> assured of real biology. The real biology could be directly
> disease-related or not. The ones that are not are copy number
> variants
> (although there is now plenty of evidence that copy number variants
> can
> be disease-associated, as well). When using high-resolution oligo
> arrays, you will need to become familiar with copy number
> polymorphism
> and databases for annotating them. CGHAnalytics contains a catalog
> of
> those built-in.
>
> As for R/Bioc versus commercial packages, that will be dictated by
> the
> questions you want to ask. We find that we routinely need and want
> to
> ask questions that are not easily answered by commercial packages.
> That
> said, a good visualization tool for CGH is HIGHLY useful, and there
> are
> now several available.
>
> Sean
>