Hi,
I have a set of data from an experiment where there appears to be an
effect of the treatment on a large number of genes. I put scatterplots
for 6 of the slides here:
http://mcnach.com/MISC/scatterplots.gif
these are Cy3 vs Cy5, in log scale.
These show that many genes are differentially expressed, and they are
mostly one one side only (upregulated; some of those slides are dye
swaps).
Would this appear to violate (too much) any of the assumptions made by
loess normalisation? Should I investigate other normalisation
procedures?
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote:
>
> Hi,
>
> I have a set of data from an experiment where there appears to be an
> effect of the treatment on a large number of genes. I put
scatterplots
> for 6 of the slides here:
>
> http://mcnach.com/MISC/scatterplots.gif
>
> these are Cy3 vs Cy5, in log scale.
>
> These show that many genes are differentially expressed, and they
are
> mostly one one side only (upregulated; some of those slides are dye
> swaps).
>
> Would this appear to violate (too much) any of the assumptions made
by
> loess normalisation? Should I investigate other normalisation
> procedures?
First, I would start by doing a VERY thorough evalutation of the slide
quality for these slides, as these are very distorted scatterplots.
IF the
slide quality looks OK, then I would probably stay away from a non-
linear
normalization method, as these will tend to make your
differentially-expressed genes look less differentially-expressed.
Sean
Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">:
>
>
>
> On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote:
>
>>
>> Hi,
>>
>> I have a set of data from an experiment where there appears to be
an
>> effect of the treatment on a large number of genes. I put
scatterplots
>> for 6 of the slides here:
>>
>> http://mcnach.com/MISC/scatterplots.gif
>>
>> these are Cy3 vs Cy5, in log scale.
>>
>> These show that many genes are differentially expressed, and they
are
>> mostly one one side only (upregulated; some of those slides are dye
>> swaps).
>>
>> Would this appear to violate (too much) any of the assumptions made
by
>> loess normalisation? Should I investigate other normalisation
>> procedures?
>
> First, I would start by doing a VERY thorough evalutation of the
slide
> quality for these slides, as these are very distorted scatterplots.
IF the
> slide quality looks OK, then I would probably stay away from a non-
linear
> normalization method, as these will tend to make your
> differentially-expressed genes look less differentially-expressed.
>
> Sean
Hi Sean,
thanks for your reply. The slides are good, I checked them well. The
strong effect is not so unexpected, as it involves transfection of
cells with a DNA-binding protein fused to a strong transactivator, so
in theory the fusion protein could be responsible of the expression of
a very large number of genes. There is some specificity to the
binding,
but there should be many target sites, often at promoters... So the
effects are more or less what we expected, I suppose, and the quality
of the slides is good. The second spike going either almost vertical
or
almost horizontal should correspond to those genes that are not
expressed on the particular cell line, but expressed after
transfection.
Do you have any suggestions of what sort of methods to use, for the
normalisation of such experiments? Until now I used loess for
everything, but I wasn't sure it would be okay for this experiment
when
I saw these plots.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
On 8/7/06 7:29 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote:
> Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">:
>
>>
>>
>>
>> On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote:
>>
>>>
>>> Hi,
>>>
>>> I have a set of data from an experiment where there appears to be
an
>>> effect of the treatment on a large number of genes. I put
scatterplots
>>> for 6 of the slides here:
>>>
>>> http://mcnach.com/MISC/scatterplots.gif
>>>
>>> these are Cy3 vs Cy5, in log scale.
>>>
>>> These show that many genes are differentially expressed, and they
are
>>> mostly one one side only (upregulated; some of those slides are
dye
>>> swaps).
>>>
>>> Would this appear to violate (too much) any of the assumptions
made by
>>> loess normalisation? Should I investigate other normalisation
>>> procedures?
>>
>> First, I would start by doing a VERY thorough evalutation of the
slide
>> quality for these slides, as these are very distorted scatterplots.
IF the
>> slide quality looks OK, then I would probably stay away from a non-
linear
>> normalization method, as these will tend to make your
>> differentially-expressed genes look less differentially-expressed.
>>
>> Sean
>
> Hi Sean,
>
> thanks for your reply. The slides are good, I checked them well. The
> strong effect is not so unexpected, as it involves transfection of
> cells with a DNA-binding protein fused to a strong transactivator,
so
> in theory the fusion protein could be responsible of the expression
of
> a very large number of genes. There is some specificity to the
binding,
> but there should be many target sites, often at promoters... So the
> effects are more or less what we expected, I suppose, and the
quality
> of the slides is good. The second spike going either almost vertical
or
> almost horizontal should correspond to those genes that are not
> expressed on the particular cell line, but expressed after
transfection.
>
> Do you have any suggestions of what sort of methods to use, for the
> normalisation of such experiments? Until now I used loess for
> everything, but I wasn't sure it would be okay for this experiment
when
> I saw these plots.
You can certainly try loess and see how the result looks, as
scatterplots
are notorious for "hiding" where the data are most dense.
Alternatively,
you could try "rotating" the scatterplot until the body of the data is
where
you think it should be--I don't know if there is a method in
Bioconductor
that does this, though.
Sean
Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">:
[...]
> You can certainly try loess and see how the result looks, as
scatterplots
> are notorious for "hiding" where the data are most dense.
Alternatively,
> you could try "rotating" the scatterplot until the body of the data
is where
> you think it should be--I don't know if there is a method in
Bioconductor
> that does this, though.
>
> Sean
Thanks Sean.
I already tried loess, and this is the MA plot for the first set of
data looks like this:
http://mcnach.com/MISC/MAplots2.png
which looks okay to me. You see the ascending diagonal is denser,
which
contains all those newly activated spots. I knew a few genes that were
expected to be there (from RT data) and they line up nicely on that
diagonal.
This was without substracting background.
When I attempted to correct for background I run into problems. Mainly
because some slides have a higher bkg than usual, and the signal is
lower than the local bkg for a good number of spots. When I use
"subtract" as a bkg correction method, it results in many negative
intensities, and those spots are removed. I then tried "half" to
overcome this, so that negative values are turned into an arbitrary
0.5... and this totally flattened the MA plot, and nothing was
statistically DE. I showed this on a previous thread:
http://mcnach.com/MISC/MAplots1.png
It's very striking. It leaves me no other choice but not removing
background (which is increasingly looking like the best option in
general, in my still short experience...)
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
On 8/7/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote:
> Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">:
>
> [...]
> > You can certainly try loess and see how the result looks, as
scatterplots
> > are notorious for "hiding" where the data are most dense.
Alternatively,
> > you could try "rotating" the scatterplot until the body of the
data is where
> > you think it should be--I don't know if there is a method in
Bioconductor
> > that does this, though.
> >
> > Sean
>
> Thanks Sean.
>
> I already tried loess, and this is the MA plot for the first set of
> data looks like this:
>
> http://mcnach.com/MISC/MAplots2.png
>
> which looks okay to me. You see the ascending diagonal is denser,
which
> contains all those newly activated spots. I knew a few genes that
were
> expected to be there (from RT data) and they line up nicely on that
> diagonal.
This MA plot indicates that the noise levels have become assymetric
after curve-fit normalization. I say so, because your data is
"bending" upwards instead of being a nice flat line, cf. Frame 33 of
48 in http://www.maths.lth.se/bioinformatics/calendar/20051108/. If
this is true, your tests down the stream might not work that well.
>
> This was without substracting background.
> When I attempted to correct for background I run into problems.
Mainly
> because some slides have a higher bkg than usual, and the signal is
> lower than the local bkg for a good number of spots. When I use
You haven't told us your platform. What type of scanner do you use?
> "subtract" as a bkg correction method, it results in many negative
> intensities, and those spots are removed. I then tried "half" to
I would say that this is expected for signals around zero (on the
intensity scale); if you have no biological signals it is a 50-50
chance if the background is stronger than the foreground. The problem
is how to deal with those. Also, do NOT be afraid of the large noise
levels at lower intensities; you do expect to see these when your
signals get closer to noise levels (closer to zero). If you want to
stabalize the variance structure there are methods for this, but then
you pay the price of loosing accuracy (you get biased log-ratio
estimates).
> overcome this, so that negative values are turned into an arbitrary
> 0.5... and this totally flattened the MA plot, and nothing was
Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005?
You might want to look into Kooperberg's background correction
methods, or the ones in limma.
> statistically DE. I showed this on a previous thread:
>
> http://mcnach.com/MISC/MAplots1.png
>
> It's very striking. It leaves me no other choice but not removing
> background (which is increasingly looking like the best option in
> general, in my still short experience...)
You haven't told us your platform. What scanner do you have? You
might have an offset in your scanner (quite commonly added to avoid
that analogue negative signals are truncated to zero), e.g. Axon and
Agilent introduce about 20-25 units (which is significant). With a
simple scan protocol it is easy to check if your scanner introduce
offset. The method is described in
H. Bengtsson, G. J?nsson and J. Vallon-Christersson, Calibration and
assessment of channel-specific biases in microarray data with extended
dynamical range, BMC Bioinformatics, 2004, 5:177.
and the estimatation and calibration methods are in aroma.light. The
scanner offset is a global constant which means that you only fit a
single parameter per channel. That is, subtracting this "background"
from the foreground signals does not introduce as much noise as if you
would subtract the image-analysis estimated backgrounds unique to each
spot. This will leave you with less (probably no) non-positive
signals. It might also be enough to remove the curvature seen in your
raw MA plots. If so, your remaining problem will be how to estimate
the overall relative scale factor between the two channels, which is
only one parameter; it should be easier than using non-parametric
curve-fit methods.
I would also like to encourage you to read up on what affine
transformations (offset plus rescaling) can do to your data and
especially your MA plots;
H. Bengtsson and O. H?ssjer, Methodological study of affine
transformations of gene expression data with proposed robust
non-parametric multi-dimensional normalization method, BMC
Bioinformatics, 2006, 7:100.
When you understand the bits and pieces of what's going on there you
will also be much more careful when you pick your normalization
method. If would say that curve-fit (loess, lowess, spline, ...)
normalization is often overkill and corrects for a symptome rather
than fixing the underlying problem. Quantile normalization can be
interpreted as a non-parametric method that corrects for affine
transformations, but it has a problem at the lower and higher
intensities. Variance stabilization methods (Rocke & Durbin, W Huber)
have an explicit affine component in there models so they are much
more suited to this type of transform. Plain affine normalization
(aroma.light) corrects for affine transformation without controlling
for variance (on purpose). The estimatation methods also differ
between the latter two approaches.
I hope this is a good start.
Cheers
Henrik
> Jose
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Quoting Henrik Bengtsson <hb at="" maths.lth.se="">:
> You haven't told us your platform. What type of scanner do you use?
GenePix 4200AL.
>> overcome this, so that negative values are turned into an arbitrary
>> 0.5... and this totally flattened the MA plot, and nothing was
>
> Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005?
> You might want to look into Kooperberg's background correction
> methods, or the ones in limma.
actually, I tried other numbers too, just to check that they did not
have a drastic effect on the final results. I just wanted a positive
number (actually >1 better, so that I can take logs directly) that is
low enough so that I get a high M value when I divide the signal of
teh
other channel by it. M values of genes that have no detectable signal
on one channel are meaningless, in that they don't represent any kind
of fold enrichment... but they're useful to help me pick those genes.
> You haven't told us your platform. What scanner do you have? You
> might have an offset in your scanner (quite commonly added to avoid
> that analogue negative signals are truncated to zero), e.g. Axon and
> Agilent introduce about 20-25 units (which is significant). With a
> simple scan protocol it is easy to check if your scanner introduce
> offset. The method is described in
>
> H. Bengtsson, G. J?nsson and J. Vallon-Christersson, Calibration and
> assessment of channel-specific biases in microarray data with
extended
> dynamical range, BMC Bioinformatics, 2004, 5:177.
>
> and the estimatation and calibration methods are in aroma.light.
The
> scanner offset is a global constant which means that you only fit a
> single parameter per channel. That is, subtracting this
"background"
> from the foreground signals does not introduce as much noise as if
you
> would subtract the image-analysis estimated backgrounds unique to
each
> spot. This will leave you with less (probably no) non-positive
> signals. It might also be enough to remove the curvature seen in
your
> raw MA plots. If so, your remaining problem will be how to estimate
> the overall relative scale factor between the two channels, which is
> only one parameter; it should be easier than using non-parametric
> curve-fit methods.
I would like to try your package aroma. I've been meaning to for a
while. I like your reasoning. But unfortunately my "exploring" time is
limited. You probably think that it will be a good investment of time
to dedicate some time now to explore these issues more in depth... and
I would agree... but unfortunately I am not able. It's not entirely my
call...
The problem I had with negative signals is enhanced in this particular
experiment because I happened to have a few slides with abnormally
high
background, mainly on the Cy3 channel. The high background was due to
a
problem in the preparation of teh samples. Usually I get pretty clean
slides. I'm working on repeating the "bad" slides to help solve this.
> When you understand the bits and pieces of what's going on there you
> will also be much more careful when you pick your normalization
> method. If would say that curve-fit (loess, lowess, spline, ...)
> normalization is often overkill and corrects for a symptome rather
> than fixing the underlying problem. Quantile normalization can be
> interpreted as a non-parametric method that corrects for affine
> transformations, but it has a problem at the lower and higher
> intensities. Variance stabilization methods (Rocke & Durbin, W
Huber)
> have an explicit affine component in there models so they are much
> more suited to this type of transform. Plain affine normalization
> (aroma.light) corrects for affine transformation without controlling
> for variance (on purpose). The estimatation methods also differ
> between the latter two approaches.
>
> I hope this is a good start.
As ever, your replies are very useful. I just wished I had a little
help so that I could spend more time looking at these details in a lot
more depth. But I will do what I can, and the replies received so far
are all very useful for me.
Thanks!
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
On 8/8/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote:
> Quoting Henrik Bengtsson <hb at="" maths.lth.se="">:
>
> > You haven't told us your platform. What type of scanner do you
use?
>
> GenePix 4200AL.
I have no feedback on this specific model, but I'm keen to hear
about your findings.
/Henrik
>
> >> overcome this, so that negative values are turned into an
arbitrary
> >> 0.5... and this totally flattened the MA plot, and nothing was
> >
> > Yes, 0.5 is very arbitrary. Why not 5, 0.05, or 0.0000000000005?
> > You might want to look into Kooperberg's background correction
> > methods, or the ones in limma.
>
> actually, I tried other numbers too, just to check that they did not
> have a drastic effect on the final results. I just wanted a positive
> number (actually >1 better, so that I can take logs directly) that
is
> low enough so that I get a high M value when I divide the signal of
teh
> other channel by it. M values of genes that have no detectable
signal
> on one channel are meaningless, in that they don't represent any
kind
> of fold enrichment... but they're useful to help me pick those
genes.
>
>
> > You haven't told us your platform. What scanner do you have? You
> > might have an offset in your scanner (quite commonly added to
avoid
> > that analogue negative signals are truncated to zero), e.g. Axon
and
> > Agilent introduce about 20-25 units (which is significant). With
a
> > simple scan protocol it is easy to check if your scanner introduce
> > offset. The method is described in
> >
> > H. Bengtsson, G. J?nsson and J. Vallon-Christersson, Calibration
and
> > assessment of channel-specific biases in microarray data with
extended
> > dynamical range, BMC Bioinformatics, 2004, 5:177.
> >
> > and the estimatation and calibration methods are in aroma.light.
The
> > scanner offset is a global constant which means that you only fit
a
> > single parameter per channel. That is, subtracting this
"background"
> > from the foreground signals does not introduce as much noise as if
you
> > would subtract the image-analysis estimated backgrounds unique to
each
> > spot. This will leave you with less (probably no) non-positive
> > signals. It might also be enough to remove the curvature seen in
your
> > raw MA plots. If so, your remaining problem will be how to
estimate
> > the overall relative scale factor between the two channels, which
is
> > only one parameter; it should be easier than using non-parametric
> > curve-fit methods.
>
> I would like to try your package aroma. I've been meaning to for a
> while. I like your reasoning. But unfortunately my "exploring" time
is
> limited. You probably think that it will be a good investment of
time
> to dedicate some time now to explore these issues more in depth...
and
> I would agree... but unfortunately I am not able. It's not entirely
my
> call...
>
> The problem I had with negative signals is enhanced in this
particular
> experiment because I happened to have a few slides with abnormally
high
> background, mainly on the Cy3 channel. The high background was due
to a
> problem in the preparation of teh samples. Usually I get pretty
clean
> slides. I'm working on repeating the "bad" slides to help solve
this.
>
> > When you understand the bits and pieces of what's going on there
you
> > will also be much more careful when you pick your normalization
> > method. If would say that curve-fit (loess, lowess, spline, ...)
> > normalization is often overkill and corrects for a symptome rather
> > than fixing the underlying problem. Quantile normalization can be
> > interpreted as a non-parametric method that corrects for affine
> > transformations, but it has a problem at the lower and higher
> > intensities. Variance stabilization methods (Rocke & Durbin, W
Huber)
> > have an explicit affine component in there models so they are much
> > more suited to this type of transform. Plain affine normalization
> > (aroma.light) corrects for affine transformation without
controlling
> > for variance (on purpose). The estimatation methods also differ
> > between the latter two approaches.
> >
> > I hope this is a good start.
>
> As ever, your replies are very useful. I just wished I had a little
> help so that I could spend more time looking at these details in a
lot
> more depth. But I will do what I can, and the replies received so
far
> are all very useful for me.
>
> Thanks!
>
> Jose
>
>
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
>
>
Hi Jose,
I think you should correct for background since as you
have commented you have slides with high background
intensity and you want to remove background biass. I
dont know if you have already tried "normexp".
Anycase and talking about the normalization process I
think you dont should be so worry about the violation
of the number of genes DE in your normalization
process. I have been working with similar experiment
that you mentioned using print-tip-loess and the
results were prety good.
It is true that the normalization process is basesd in
some assumptions. But not single microarray experimen
fullfil these assumptions...
HTH
Manuel
--- J.delasHeras at ed.ac.uk escribi?:
> Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">:
>
> [...]
> > You can certainly try loess and see how the result
> looks, as scatterplots
> > are notorious for "hiding" where the data are most
> dense. Alternatively,
> > you could try "rotating" the scatterplot until the
> body of the data is where
> > you think it should be--I don't know if there is a
> method in Bioconductor
> > that does this, though.
> >
> > Sean
>
> Thanks Sean.
>
> I already tried loess, and this is the MA plot for
> the first set of
> data looks like this:
>
> http://mcnach.com/MISC/MAplots2.png
>
> which looks okay to me. You see the ascending
> diagonal is denser, which
> contains all those newly activated spots. I knew a
> few genes that were
> expected to be there (from RT data) and they line up
> nicely on that
> diagonal.
>
> This was without substracting background.
> When I attempted to correct for background I run
> into problems. Mainly
> because some slides have a higher bkg than usual,
> and the signal is
> lower than the local bkg for a good number of spots.
> When I use
> "subtract" as a bkg correction method, it results in
> many negative
> intensities, and those spots are removed. I then
> tried "half" to
> overcome this, so that negative values are turned
> into an arbitrary
> 0.5... and this totally flattened the MA plot, and
> nothing was
> statistically DE. I showed this on a previous
> thread:
>
> http://mcnach.com/MISC/MAplots1.png
>
> It's very striking. It leaves me no other choice but
> not removing
> background (which is increasingly looking like the
> best option in
> general, in my still short experience...)
>
> Jose
>
> --
> Dr. Jose I. de las Heras Email:
> J.delasHeras at ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone:
> +44 (0)131 6513374
> Institute for Cell & Molecular Biology Fax:
> +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
Quoting M Perez <perezperezmm at="" yahoo.es="">:
> Hi Jose,
>
> I think you should correct for background since as you
> have commented you have slides with high background
> intensity and you want to remove background biass. I
> dont know if you have already tried "normexp".
Hi Manuel,
I haven't really. I did a long time ago and what put me off was having
to search for the right offset, when I was hoping for something a bit
more "automatic" (and at the time I used LimmaGUI, which is a bit more
tedious if you want to experiment a little). I should try that.
However, I notice that the background usually appears to have little
or
nothing to do with the signals measured. The background tends to be
very uniform across the slide, and the fact that I get "negative
spots"
where you see less signal on the actual spot than around it, makes me
think that the cDNA spotted acts as a pretty good block against that
general background. In other words, I am not convinced that the
background measured on the glass has much to do with the signal I
measured on a spot of DNA, and substracting background may be actually
a bad thing to do.
Another reason I think background substraction doesn't matter much, is
that on the occasions when I do see some pattern on the background
(using 'imageplot' for instance, you can tune the ranges to display to
enhance and view those patterns), it often doesn't translate on a
pattern when you display the red/green ratios, or the signals on their
own. Not always, but quite often, from what I've seen. And when you do
get some scratches that affect clearly the signal measured, it might
make more sense to flag those spots... or to simply rely on the fact
that there should be enough replicates, so an odd measurement should
not affect the outcome too much (hopefully if on another slide I have
another scratch it will not affect the very same spots again :-)
I think I like Henrik Bengtsson's idea about measuring the background
inherent to a particular scanner, and substract that instead... but I
haven't yet explored that properly (hangs head in shame)... the
probelm
with being a one-man operation is that you're pressed to get results
that are "good enough" to continue the biology, rather than spending
too much time in working out what's teh best way to get the most of
the
data available. If only I could clone myself... but then I wouldn't
like to work with myself... ;-)
Right now I am exploring another avenue: repeating those experiments
that gave me high background with view to remove the offending slides
and use something of better quality. In this case it's relatively
simple, but many tiimes I will not have the luxury, therefore I still
want to understand the problem with background better.
> Anycase and talking about the normalization process I
> think you dont should be so worry about the violation
> of the number of genes DE in your normalization
> process. I have been working with similar experiment
> that you mentioned using print-tip-loess and the
> results were prety good.
I'm glad to hear that. I had similar comments from other sources, and
I
must admit that the (very) few controls I had in my experiment seem to
behave properly if apply print-tip-loess (and no bkg correction,
because when I do I run into problems, as I mentioned in another
thread)
> It is true that the normalization process is basesd in
> some assumptions. But not single microarray experimen
> fullfil these assumptions...
> HTH
> Manuel
I am aware that loess is pretty robust... I just wasn't sure that it
was robust enough in an experiment such as this, where I expect the
average median of ratios to be above 1 (although not by much,
admittedly).
Thanks for all the comments. I will definitely explore the normexp bkg
correction method.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
On 8/8/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote:
> Quoting M Perez <perezperezmm at="" yahoo.es="">:
>
> > Hi Jose,
> >
> > I think you should correct for background since as you
> > have commented you have slides with high background
> > intensity and you want to remove background biass. I
> > dont know if you have already tried "normexp".
>
> Hi Manuel,
>
> I haven't really. I did a long time ago and what put me off was
having
> to search for the right offset, when I was hoping for something a
bit
> more "automatic" (and at the time I used LimmaGUI, which is a bit
more
> tedious if you want to experiment a little). I should try that.
> However, I notice that the background usually appears to have little
or
> nothing to do with the signals measured. The background tends to be
> very uniform across the slide, and the fact that I get "negative
spots"
> where you see less signal on the actual spot than around it, makes
me
> think that the cDNA spotted acts as a pretty good block against that
> general background. In other words, I am not convinced that the
> background measured on the glass has much to do with the signal I
> measured on a spot of DNA, and substracting background may be
actually
> a bad thing to do.
That is a very good statement. We have to ask ourselves what kind of
"background" there is, not just define background from what methods we
have available! For instance, it is possible to prove scientifically
that the scanner introduce an offset. It might simply be that the
image-analysis based background estimators happen to get close to the
scanner background; that does not mean that the detected signal in the
proximity of a spot is added to the spot, it just happens to be a good
proxy to get to the scanner offset. That is just a hypothesis and in
general I think that image-background signals are poor and noisy
estimators of the scanner offset.
> Another reason I think background substraction doesn't matter much,
is
> that on the occasions when I do see some pattern on the background
> (using 'imageplot' for instance, you can tune the ranges to display
to
> enhance and view those patterns), it often doesn't translate on a
> pattern when you display the red/green ratios, or the signals on
their
> own. Not always, but quite often, from what I've seen. And when you
do
> get some scratches that affect clearly the signal measured, it might
> make more sense to flag those spots... or to simply rely on the fact
> that there should be enough replicates, so an odd measurement should
> not affect the outcome too much (hopefully if on another slide I
have
> another scratch it will not affect the very same spots again :-)
Agree.
> I think I like Henrik Bengtsson's idea about measuring the
background
> inherent to a particular scanner, and substract that instead... but
I
> haven't yet explored that properly (hangs head in shame)... the
probelm
> with being a one-man operation is that you're pressed to get results
> that are "good enough" to continue the biology, rather than spending
> too much time in working out what's teh best way to get the most of
the
> data available. If only I could clone myself... but then I wouldn't
> like to work with myself... ;-)
>
> Right now I am exploring another avenue: repeating those experiments
> that gave me high background with view to remove the offending
slides
> and use something of better quality. In this case it's relatively
> simple, but many tiimes I will not have the luxury, therefore I
still
> want to understand the problem with background better.
Seriously, it is very easy to do scanner calibration. Much easier
that repeating experiments. Also, if the scanner offset is stable
over time, which I suspect it is, you might only have to do this once
every now and then, and simply just reuse the same estimate across
arrays.
Scan the same array at say four different PMTs, e.g. 800V, 700V, 600V
and 500V. Keep the array in the scanner between scans to keep
everything but the PMT as similar as possible. That way you can reuse
the spot mask identified by Axon GenePix Pro on the 800V for the other
images too. You'll get four GPR files. Pull out the foreground
signals for one channel at the time from each of them as a vector,
e.g. X800, X700, X600, X500, and put them in a matrix
X <- matrix(c(X800, X700, X600, X500), ncol=4)
Then estimate and calibrate the signals;
library(aroma.light)
Xc <- calibrateMultiscan(X)
'Xc' will be a singel vector or length nrow(X). The attribute
'modelFit' will contain the parameter estimates for that channel, i.e.
the scanner offset etc. The scanner offset is in 'adiag', that is
scannerOffset <- attr(Xc, "modelFit")$adiag
Do the same for the other channel(s). Single-channel users are done
here.
FYI: The 'aroma.light' package provides a matrix-only interface to
calibration/normalization methods. If have higher-order interfaces in
'aroma' off-Bioconductor, but the above should be enough. When there
is time (?!?) I'll also provide wrappers to the 'exprSet' class.
/Henrik
>
> > Anycase and talking about the normalization process I
> > think you dont should be so worry about the violation
> > of the number of genes DE in your normalization
> > process. I have been working with similar experiment
> > that you mentioned using print-tip-loess and the
> > results were prety good.
>
> I'm glad to hear that. I had similar comments from other sources,
and I
> must admit that the (very) few controls I had in my experiment seem
to
> behave properly if apply print-tip-loess (and no bkg correction,
> because when I do I run into problems, as I mentioned in another
thread)
>
>
> > It is true that the normalization process is basesd in
> > some assumptions. But not single microarray experimen
> > fullfil these assumptions...
> > HTH
> > Manuel
>
> I am aware that loess is pretty robust... I just wasn't sure that it
> was robust enough in an experiment such as this, where I expect the
> average median of ratios to be above 1 (although not by much,
> admittedly).
>
> Thanks for all the comments. I will definitely explore the normexp
bkg
> correction method.
>
> Jose
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
I forgot to add one thing:
On 8/8/06, Henrik Bengtsson <hb at="" stat.berkeley.edu=""> wrote:
> On 8/8/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk="">
wrote:
> > Quoting M Perez <perezperezmm at="" yahoo.es="">:
> >
> > > Hi Jose,
> > >
> > > I think you should correct for background since as you
> > > have commented you have slides with high background
> > > intensity and you want to remove background biass. I
> > > dont know if you have already tried "normexp".
> >
> > Hi Manuel,
> >
> > I haven't really. I did a long time ago and what put me off was
having
> > to search for the right offset, when I was hoping for something a
bit
> > more "automatic" (and at the time I used LimmaGUI, which is a bit
more
> > tedious if you want to experiment a little). I should try that.
> > However, I notice that the background usually appears to have
little or
> > nothing to do with the signals measured. The background tends to
be
> > very uniform across the slide, and the fact that I get "negative
spots"
> > where you see less signal on the actual spot than around it, makes
me
> > think that the cDNA spotted acts as a pretty good block against
that
> > general background. In other words, I am not convinced that the
> > background measured on the glass has much to do with the signal I
> > measured on a spot of DNA, and substracting background may be
actually
> > a bad thing to do.
>
> That is a very good statement. We have to ask ourselves what kind
of
> "background" there is, not just define background from what methods
we
> have available! For instance, it is possible to prove
scientifically
> that the scanner introduce an offset. It might simply be that the
> image-analysis based background estimators happen to get close to
the
> scanner background; that does not mean that the detected signal in
the
> proximity of a spot is added to the spot, it just happens to be a
good
> proxy to get to the scanner offset. That is just a hypothesis and
in
> general I think that image-background signals are poor and noisy
> estimators of the scanner offset.
>
> > Another reason I think background substraction doesn't matter
much, is
> > that on the occasions when I do see some pattern on the background
> > (using 'imageplot' for instance, you can tune the ranges to
display to
> > enhance and view those patterns), it often doesn't translate on a
> > pattern when you display the red/green ratios, or the signals on
their
> > own. Not always, but quite often, from what I've seen. And when
you do
> > get some scratches that affect clearly the signal measured, it
might
> > make more sense to flag those spots... or to simply rely on the
fact
> > that there should be enough replicates, so an odd measurement
should
> > not affect the outcome too much (hopefully if on another slide I
have
> > another scratch it will not affect the very same spots again :-)
>
> Agree.
>
> > I think I like Henrik Bengtsson's idea about measuring the
background
> > inherent to a particular scanner, and substract that instead...
but I
> > haven't yet explored that properly (hangs head in shame)... the
probelm
> > with being a one-man operation is that you're pressed to get
results
> > that are "good enough" to continue the biology, rather than
spending
> > too much time in working out what's teh best way to get the most
of the
> > data available. If only I could clone myself... but then I
wouldn't
> > like to work with myself... ;-)
> >
> > Right now I am exploring another avenue: repeating those
experiments
> > that gave me high background with view to remove the offending
slides
> > and use something of better quality. In this case it's relatively
> > simple, but many tiimes I will not have the luxury, therefore I
still
> > want to understand the problem with background better.
>
> Seriously, it is very easy to do scanner calibration. Much easier
> that repeating experiments. Also, if the scanner offset is stable
> over time, which I suspect it is, you might only have to do this
once
> every now and then, and simply just reuse the same estimate across
> arrays.
>
> Scan the same array at say four different PMTs, e.g. 800V, 700V,
600V
> and 500V. Keep the array in the scanner between scans to keep
> everything but the PMT as similar as possible. That way you can
reuse
> the spot mask identified by Axon GenePix Pro on the 800V for the
other
> images too. You'll get four GPR files. Pull out the foreground
> signals for one channel at the time from each of them as a vector,
> e.g. X800, X700, X600, X500, and put them in a matrix
>
> X <- matrix(c(X800, X700, X600, X500), ncol=4)
Already here you can see if you've got scanner offset or not. Plot
you data pairwise and zoom in at (0,0) and see if the datapoints from
the different pairs converge at (0,0) or not;
par(pch=19)
plot(NA, xlim=c(0,700), ylim=c(0,700), col=(col <- 1))
abline(a=0,b=1)
for (ii in 1:3) for (jj in (ii+1):4) points(X[,c(ii,jj)], col=(col <-
col + 1))
See attached image for example.
/Henrik
>
> Then estimate and calibrate the signals;
>
> library(aroma.light)
> Xc <- calibrateMultiscan(X)
>
> 'Xc' will be a singel vector or length nrow(X). The attribute
> 'modelFit' will contain the parameter estimates for that channel,
i.e.
> the scanner offset etc. The scanner offset is in 'adiag', that is
>
> scannerOffset <- attr(Xc, "modelFit")$adiag
>
> Do the same for the other channel(s). Single-channel users are done
here.
>
> FYI: The 'aroma.light' package provides a matrix-only interface to
> calibration/normalization methods. If have higher-order interfaces
in
> 'aroma' off-Bioconductor, but the above should be enough. When
there
> is time (?!?) I'll also provide wrappers to the 'exprSet' class.
>
> /Henrik
>
> >
> > > Anycase and talking about the normalization process I
> > > think you dont should be so worry about the violation
> > > of the number of genes DE in your normalization
> > > process. I have been working with similar experiment
> > > that you mentioned using print-tip-loess and the
> > > results were prety good.
> >
> > I'm glad to hear that. I had similar comments from other sources,
and I
> > must admit that the (very) few controls I had in my experiment
seem to
> > behave properly if apply print-tip-loess (and no bkg correction,
> > because when I do I run into problems, as I mentioned in another
thread)
> >
> >
> > > It is true that the normalization process is basesd in
> > > some assumptions. But not single microarray experimen
> > > fullfil these assumptions...
> > > HTH
> > > Manuel
> >
> > I am aware that loess is pretty robust... I just wasn't sure that
it
> > was robust enough in an experiment such as this, where I expect
the
> > average median of ratios to be above 1 (although not by much,
> > admittedly).
> >
> > Thanks for all the comments. I will definitely explore the normexp
bkg
> > correction method.
> >
> > Jose
> >
> > --
> > Dr. Jose I. de las Heras Email: J.delasHeras
at ed.ac.uk
> > The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
> > Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
> > Swann Building, Mayfield Road
> > University of Edinburgh
> > Edinburgh EH9 3JR
> > UK
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scannerOffset.png
Type: image/png
Size: 22436 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20060808
/efae6d9c/attachment.png
Hi.
On 8/7/06, J.delasHeras at ed.ac.uk <j.delasheras at="" ed.ac.uk=""> wrote:
> Quoting Sean Davis <sdavis2 at="" mail.nih.gov="">:
>
> >
> >
> >
> > On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <j.delasheras at="" ed.ac.uk=""> wrote:
> >
> >>
> >> Hi,
> >>
> >> I have a set of data from an experiment where there appears to be
an
> >> effect of the treatment on a large number of genes. I put
scatterplots
> >> for 6 of the slides here:
> >>
> >> http://mcnach.com/MISC/scatterplots.gif
> >>
> >> these are Cy3 vs Cy5, in log scale.
> >>
> >> These show that many genes are differentially expressed, and they
are
> >> mostly one one side only (upregulated; some of those slides are
dye
> >> swaps).
> >>
> >> Would this appear to violate (too much) any of the assumptions
made by
> >> loess normalisation? Should I investigate other normalisation
> >> procedures?
> >
> > First, I would start by doing a VERY thorough evalutation of the
slide
> > quality for these slides, as these are very distorted
scatterplots. IF the
> > slide quality looks OK, then I would probably stay away from a
non-linear
> > normalization method, as these will tend to make your
> > differentially-expressed genes look less differentially-expressed.
> >
> > Sean
>
> Hi Sean,
>
> thanks for your reply. The slides are good, I checked them well. The
> strong effect is not so unexpected, as it involves transfection of
> cells with a DNA-binding protein fused to a strong transactivator,
so
> in theory the fusion protein could be responsible of the expression
of
> a very large number of genes. There is some specificity to the
binding,
> but there should be many target sites, often at promoters... So the
> effects are more or less what we expected, I suppose, and the
quality
> of the slides is good. The second spike going either almost vertical
or
> almost horizontal should correspond to those genes that are not
> expressed on the particular cell line, but expressed after
transfection.
>
> Do you have any suggestions of what sort of methods to use, for the
> normalisation of such experiments? Until now I used loess for
> everything, but I wasn't sure it would be okay for this experiment
when
> I saw these plots.
Roughly what fraction of DEs do you except/see by visual inspection?
BTW, it is not clear if your plots in scatterplots.gif are on the
intensity or log scale, but looking at the noise structure I guess on
the log scale.
loess(), not lowess(), can be tuned to be very robust against outliers
including non-symmetric ones. I know Gordon Smyth has done some
examples/slides on this, but I'm not sure if they're in limma or not.
In addition, in the aroma.light package you can assign weights to the
datapoints for some of the normalization methods. Assigning a smaller
weight to a datapoint will make that datapoint have less of a say in
the estimation of the normalization function, but when it comes to
normalize/transform the datapoints, all are transformed equally much.
So with weights you may be able to tune your robustness against
outliers further.
/Henrik
> Jose
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi Henrik,
> Roughly what fraction of DEs do you except/see by visual inspection?
> BTW, it is not clear if your plots in scatterplots.gif are on the
> intensity or log scale, but looking at the noise structure I guess
on
> the log scale.
Yes, it's log scale. I did mention it in teh other thread but forgot
to
say it here.
What fraction? That's hard to say. Visually I'd say easily 20 or 30%.
But that's a rough estimate. I thought this was probably a lot higher
than most experiments.
> loess(), not lowess(), can be tuned to be very robust against
outliers
> including non-symmetric ones. I know Gordon Smyth has done some
> examples/slides on this, but I'm not sure if they're in limma or
not.
> In addition, in the aroma.light package you can assign weights to
the
> datapoints for some of the normalization methods. Assigning a
smaller
> weight to a datapoint will make that datapoint have less of a say in
> the estimation of the normalization function, but when it comes to
> normalize/transform the datapoints, all are transformed equally
much.
> So with weights you may be able to tune your robustness against
> outliers further.
that's on my "to do" list... I can use weights in limma.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
Quoting Henrik Bengtsson <hb at="" maths.lth.se="">:
>
> In the bigger picture, given that you can identify those 20-30% DEs,
> how are you going interpret such a large list of genes?
>
> /H
The number of "useful" genes is quite smaller. This is because my
experiment consists of 4 separate sub-experiments, all using a common
reference (untransfected cells, in this case). Three of the
subexperiments consist on teh hybridisation of transfected cells vs.
untransfected. The transfection is of a construct expressing a fusion
protein, teh first part contains a DNA-binding domain with certain
sequence specificity (that we expect to occur in many promoters), the
second is a strong transactivator. I'm hoping to detect teh binding of
these protein domains by looking at what genes are upregulated,
especially those that are only expressed after transfection. There are
three subexperiments because they are slightly different proteins. The
fourth experiment is a control, one of the previous fusion proteins
with a couple of point mutations that we know to abolish strong
specific DNA binding. Transfection of this construct still results in
upregulation of many genes. What i do is analyse all data together
(same common reference), and remove the DE genes (using an FDR of
0.05%
or 0.01% as cut off) of the control experiment from the other three.
Thsi reduces substantially the number of genes. From the remainder,
then I focus on those that have negligible expression on teh
untransfected cells, and decent expression afterwards. I then contrast
this to what happened on teh control experiment (despite not being
picked as DE in it). At the end I have tens of candidates. Less than
100. It's not a crazy number and then proceed to verification by RT
etc, and the biology starts.
When we started the experiment we were not sure what we would get. IN
theory we could get thousands of genes. It all depends on how good our
control is. that's why I used a simple common reference design, as it
allows us to add easily another control if we find a better one.
I already analysed a set of data on a cell line, with RNA prepared by
somebody else. It worked pretty well, but the effect wasn't as great
as
I am seeing here. The transfection efficiency may have something to do
with it. I checked all my transfections by Western blot and only used
the ones that gave me strong expression of teh fusion protein, I
suspect the other person wasn't so picky.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK