On Wednesday 06 December 2006 11:12, Jo?o Fadista wrote:
> Dear all,
>
> I was wondering if there are other methods for combining replicate
spots
> other than the average or the median. I am asking this in concern
with CGH
> data analysis because I do not know how, and if, we can take
advantage of
> the genomic structure of the array CGH data for combining replicate
spots.
>
> For the sake of the argument I put below two hypothetical examples:
> - Combining replicate spots in a different way depending on what
region of
> the chromosome or genome they are; - Or give more weight to spots
that we
> know that have more reliability.
I don't think there are included in Bioconductor. However, you can
aggregate
the data however you see fit, though it will mean writing some code to
do so.
Sean
On Wednesday 06 December 2006 17:12, Jo?o Fadista wrote:
> Dear all,
>
> I was wondering if there are other methods for combining replicate
spots
> other than the average or the median. I am asking this in concern
with CGH
> data analysis because I do not know how, and if, we can take
advantage of
> the genomic structure of the array CGH data for combining replicate
spots.
>
> For the sake of the argument I put below two hypothetical examples:
> - Combining replicate spots in a different way depending on what
region of
> the chromosome or genome they are; - Or give more weight to spots
that we
> know that have more reliability.
>
> Something like this if you know what I mean.
Dear Joao,
This is nothing ellaborate; just a couple of thoughts.
1. I assume you mean true replicate spots. In other words, these are
the exact
same DNA piece, and they map to exactly the same locations in the
chromosome.
2. Ideally, I'd like a method that can deal with replicate spots
without even
asking you to take the mean or the median. One problem I find with
means or
medians is that, if you do not have the exact same number of
replicates for
all locations, then you are estimating a value that has different
variances
over different locations.
I think (non-homogeneous) HMMs and related techniques are suited for
dealing
with arbitrary (and different) number of replicate spots: at location
"t" you
happen to have more than one observation, and you are fitting a model
where
those observed log ratios come from an emission function, blablabla.
By not
taking means/medians/whatever, you do not violate assumptions related
to the
variance of the emission functions. In other words, conditional on
being on
state "k" you are log ratios are, say, ~ N(mu, sigma).
(I'll admit we have a "hidden agenda", with our RJaCGH package :-).
R.
>
>
> Best regards
>
> Jo?o Fadista
> Ph.d. student
>
>
>
> Danish Institute of Agricultural Sciences
> Research Centre Foulum
> Dept. of Genetics and Biotechnology
> Blichers All? 20, P.O. BOX 50
> DK-8830 Tjele
>
> Phone: +45 8999 1900
> Direct: +45 8999 8999
> E-mail: Joao.Fadista at agrsci.dk <mailto:joao.fadista at="" agrsci.dk="">
> Web: www.agrsci.org <http: www.agrsci.org=""/>
> ________________________________
>
> News and news media
<http: www.agrsci.org="" navigation="" nyheder_og_presse=""> .
>
> This email may contain information that is confidential. Any use or
> publication of this email without written permission from DIAS is
not
> allowed. If you are not the intended recipient, please notify DIAS
> immediately and delete this email.
>
>
> [[alternative HTML version deleted]]
--
Ram?n D?az-Uriarte
Bioinformatics
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en
s...{{dropped}}
Dear Ramon,
Thanks for the insights about the replicate spots.
About the RJaCGH package, I would like to know what are the main
features of your heterogeneous HMM algorithm. I am asking this because
I would like to compare it with the only other heterogeneous HMM
algorithm that I know that was made for CGH analysis.
This algorithm is implemented in snapCGH package and it is called
BioHMM. It incorporates the distance between clones into the model
assigning a higher probability of state change to clones that are a
larger distance apart on a chromosome.
Best regards
Jo?o Fadista
Ph.d. student
Danish Institute of Agricultural Sciences
Research Centre Foulum
Dept. of Genetics and Biotechnology
Blichers All? 20, P.O. BOX 50
DK-8830 Tjele
Phone: +45 8999 1900
Direct: +45 8999 8999
E-mail: Joao.Fadista at agrsci.dk
Web: http://www.agrsci.org
This email may contain information that is confidential.
Any use or publication of this email without written permission from
DIAS is not allowed.
If you are not the intended recipient, please notify DIAS immediately
and delete this email.
-----Original Message-----
From: Ramon Diaz-Uriarte [mailto:rdiaz@cnio.es]
Sent: Thursday, December 07, 2006 12:18 PM
To: bioconductor at stat.math.ethz.ch
Cc: Jo?o Fadista
Subject: Re: [BioC] Combining replicate spots in CGH data
On Wednesday 06 December 2006 17:12, Jo?o Fadista wrote:
> Dear all,
>
> I was wondering if there are other methods for combining replicate
> spots other than the average or the median. I am asking this in
> concern with CGH data analysis because I do not know how, and if, we
> can take advantage of the genomic structure of the array CGH data
for combining replicate spots.
>
> For the sake of the argument I put below two hypothetical examples:
> - Combining replicate spots in a different way depending on what
> region of the chromosome or genome they are; - Or give more weight
to
> spots that we know that have more reliability.
>
> Something like this if you know what I mean.
Dear Joao,
This is nothing ellaborate; just a couple of thoughts.
1. I assume you mean true replicate spots. In other words, these are
the exact same DNA piece, and they map to exactly the same locations
in the chromosome.
2. Ideally, I'd like a method that can deal with replicate spots
without even asking you to take the mean or the median. One problem I
find with means or medians is that, if you do not have the exact same
number of replicates for all locations, then you are estimating a
value that has different variances over different locations.
I think (non-homogeneous) HMMs and related techniques are suited for
dealing with arbitrary (and different) number of replicate spots: at
location "t" you happen to have more than one observation, and you are
fitting a model where those observed log ratios come from an emission
function, blablabla. By not taking means/medians/whatever, you do not
violate assumptions related to the variance of the emission functions.
In other words, conditional on being on state "k" you are log ratios
are, say, ~ N(mu, sigma).
(I'll admit we have a "hidden agenda", with our RJaCGH package :-).
R.
>
>
> Best regards
>
> Jo?o Fadista
> Ph.d. student
>
>
>
> Danish Institute of Agricultural Sciences Research Centre
Foulum
> Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50
> DK-8830 Tjele
>
> Phone: +45 8999 1900
> Direct: +45 8999 8999
> E-mail: Joao.Fadista at agrsci.dk <mailto:joao.fadista at="" agrsci.dk="">
> Web: www.agrsci.org <http: www.agrsci.org=""/>
> ________________________________
>
> News and news media
<http: www.agrsci.org="" navigation="" nyheder_og_presse=""> .
>
> This email may contain information that is confidential. Any use or
> publication of this email without written permission from DIAS is
not
> allowed. If you are not the intended recipient, please notify DIAS
> immediately and delete this email.
>
>
> [[alternative HTML version deleted]]
--
Ram?n D?az-Uriarte
Bioinformatics
Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish
National Cancer Center) Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en
s...{{dropped}}
On Thursday 07 December 2006 13:55, Jo?o Fadista wrote:
> Dear Ramon,
>
> Thanks for the insights about the replicate spots.
>
> About the RJaCGH package, I would like to know what are the main
features
> of your heterogeneous HMM algorithm. I am asking this because I
would like
> to compare it with the only other heterogeneous HMM algorithm that I
know
> that was made for CGH analysis.
>
> This algorithm is implemented in snapCGH package and it is called
BioHMM.
> It incorporates the distance between clones into the model assigning
a
> higher probability of state change to clones that are a larger
distance
> apart on a chromosome.
>
We use a Bayesian model fitted with MCMC and reversible jump, and
incorporate
uncertainty via Bayesian Model Averaging.
There are several differences with BioHMM. First, because we use MCMC,
BioHMM
is a lot faster. However, RJaCGH provides posterior probabilities of
alteration. Also, we use reversible jump (instead of an AIC-based
approach as
in BioHMM) for dealing with the unknown number of hidden states
problem. I'd
say these are the main differences. There are also some other
differences in
how the non-homogenous part is implemented, but I'd say these are
minor
compared to the previous ones.
Further details, comparisons with BioHMM (and other methods), etc, are
provided in the tech. report available from COBRA
(http://biostats.bepress.com/cobra/ps/art9/) or from my web page
(http://www.ligarto.org/rdiaz/Papers/rjhmm-report-plus-sup-mat.pdf).
Best,
R.
>
> Best regards
>
> Jo?o Fadista
> Ph.d. student
>
>
> Danish Institute of Agricultural Sciences
> Research Centre Foulum
> Dept. of Genetics and Biotechnology
> Blichers All? 20, P.O. BOX 50
> DK-8830 Tjele
>
> Phone: +45 8999 1900
> Direct: +45 8999 8999
>
> E-mail: Joao.Fadista at agrsci.dk
> Web: http://www.agrsci.org
>
> This email may contain information that is confidential.
> Any use or publication of this email without written permission from
DIAS
> is not allowed. If you are not the intended recipient, please notify
DIAS
> immediately and delete this email.
>
>
>
>
>
> -----Original Message-----
> From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es]
> Sent: Thursday, December 07, 2006 12:18 PM
> To: bioconductor at stat.math.ethz.ch
> Cc: Jo?o Fadista
> Subject: Re: [BioC] Combining replicate spots in CGH data
>
> On Wednesday 06 December 2006 17:12, Jo?o Fadista wrote:
> > Dear all,
> >
> > I was wondering if there are other methods for combining replicate
> > spots other than the average or the median. I am asking this in
> > concern with CGH data analysis because I do not know how, and if,
we
> > can take advantage of the genomic structure of the array CGH data
for
> > combining replicate spots.
> >
> > For the sake of the argument I put below two hypothetical
examples:
> > - Combining replicate spots in a different way depending on what
> > region of the chromosome or genome they are; - Or give more weight
to
> > spots that we know that have more reliability.
> >
> > Something like this if you know what I mean.
>
> Dear Joao,
>
> This is nothing ellaborate; just a couple of thoughts.
>
> 1. I assume you mean true replicate spots. In other words, these are
the
> exact same DNA piece, and they map to exactly the same locations in
the
> chromosome.
>
> 2. Ideally, I'd like a method that can deal with replicate spots
without
> even asking you to take the mean or the median. One problem I find
with
> means or medians is that, if you do not have the exact same number
of
> replicates for all locations, then you are estimating a value that
has
> different variances over different locations.
>
> I think (non-homogeneous) HMMs and related techniques are suited for
> dealing with arbitrary (and different) number of replicate spots: at
> location "t" you happen to have more than one observation, and you
are
> fitting a model where those observed log ratios come from an
emission
> function, blablabla. By not taking means/medians/whatever, you do
not
> violate assumptions related to the variance of the emission
functions. In
> other words, conditional on being on state "k" you are log ratios
are, say,
> ~ N(mu, sigma).
>
>
> (I'll admit we have a "hidden agenda", with our RJaCGH package :-).
>
> R.
>
> > Best regards
> >
> > Jo?o Fadista
> > Ph.d. student
> >
> >
> >
> > Danish Institute of Agricultural Sciences Research Centre
Foulum
> > Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50
> > DK-8830 Tjele
> >
> > Phone: +45 8999 1900
> > Direct: +45 8999 8999
> > E-mail: Joao.Fadista at agrsci.dk <mailto:joao.fadista at="" agrsci.dk="">
> > Web: www.agrsci.org <http: www.agrsci.org=""/>
> > ________________________________
> >
> > News and news media
<http: www.agrsci.org="" navigation="" nyheder_og_presse="">
> > .
> >
> > This email may contain information that is confidential. Any use
or
> > publication of this email without written permission from DIAS is
not
> > allowed. If you are not the intended recipient, please notify DIAS
> > immediately and delete this email.
> >
> >
> > [[alternative HTML version deleted]]
>
> --
> Ram?n D?az-Uriarte
> Bioinformatics
> Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish
National
> Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
>
> http://ligarto.org/rdiaz
> PGP KeyID: 0xE89B3462
> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>
>
>
> **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso
los
> ficheros adjuntos, pueden contener informaci?n protegida para el uso
> exclusivo de su destinatario. Se proh?be la distribuci?n,
reproducci?n o
> cualquier otro tipo de transmisi?n por parte de otra persona que no
sea el
> destinatario. Si usted recibe por error este correo, se ruega
comunicarlo
> al remitente y borrar el mensaje recibido. **CONFIDENTIALITY
NOTICE** This
> email communication and any attachments may contain confidential and
> privileged information for the sole use of the designated recipient
named
> above. Distribution, reproduction or any other use of this
transmission by
> any party other than the intended recipient is prohibited. If you
are not
> the intended recipient please contact the sender and delete all
copies.
--
Ram?n D?az-Uriarte
Bioinformatics
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)
**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en
s...{{dropped}}