Hi Mayer,
Thanks for your reply, I fully agree the bell shape curve is observed
for a single genes across sample.
Best,
> From: c.mayer@abdn.ac.uk
> To: hibergo@outlook.com; jmacdon@uw.edu
> CC: bioconductor@r-project.org
> Subject: RE: [BioC] distribution of agilent array data.â
> Date: Mon, 11 Nov 2013 18:04:31 +0000
>
> Hello Chunuxuan,
>
> I have worked with Agilent arrays a lot and can confirm Jim's
comment the type of distribution you show (with a heavy right tail) is
fairly typical. If you follow Jim's advice of smaller bin sizes/ more
bins (nclass =100 or so) you will probably see that there is some mass
of the distribution to left of the peak/mode (as you would expect from
normexp).
>
> I guess what might be confusing you is that the normalisation +
logging is supposed to give you normally distributed data (or at least
something not so very far away from it) which are symmetrically
distributed. But this is a statement about the distribution for the
replicates WITHIN genes, not across genes.
>
> Best Wishes
>
> Claus
>
> Dr. Claus-D. Mayer
> Biomathematics & Statistics Scotland (BioSS)
> Rowett Institute of Nutrition and Health
> University of Aberdeen
> Aberdeen AB21 9SB, Scotland, UK.
> email: claus@bioss.ac.uk or c.mayer@abdn.ac.uk
> Telephone: +44 (0) 1224 438652
>
> Biomathematics and Statistics Scotland (BioSS) is formally part of
The James Hutton Institute,
> a registered Scottish charity No. SC041796 and a company limited by
guarantee No. SC374831
>
>
> > -----Original Message-----
> > From: bioconductor-bounces@r-project.org [mailto:bioconductor-
> > bounces@r-project.org] On Behalf Of shao chunxuan
> > Sent: 11 November 2013 15:52
> > To: James W. MacDonald
> > Cc: bioconductor
> > Subject: Re: [BioC] distribution of agilent array data.â
> >
> > Hi Jim,
> > Thanks for the helpful comments.
> > I have this question partially because the data has bee been
normalized
> > by other people, in which the distribution is more or less
symmetric.
> > The alternative codes are:
> > library(limma)targets <- readTargets("targets.txt")x <-
> > read.maimages(targets,
source="agilent",columns=list(R="gMedianSignal",
> > Rb="gBGMedianSignal", G="gMedianSignal",
Gb="gBGMedianSignal"))y.bg <-
> > backgroundCorrect(x, method="normexp")eset <- y.bg$G ## log2
transform
> > before normalization!!!eset.l <- round(log2(eset), 4)y.bgn.l.2 <-
> > normalizeBetweenArrays(eset.l, method="quantile") I got a bell
shape
> > distribution for all probes in a single array if log2 transformed
> > before normalization. It is an old question whether to log2 first,
but
> > in my data, it doesn't matter, I found that the boxplot for a
single
> > genes across patients are identical, and the signature genes can
> > separated patients very well in both conditions.
> >
> > Best,
> > Chunuxan
> >
> >
> > > Date: Sun, 10 Nov 2013 19:45:43 -0500
> > > From: jmacdon@uw.edu
> > > To: hibergo@outlook.com
> > > CC: bioconductor@r-project.org
> > > Subject: Re: [BioC] distribution of agilent array data.þ
> > >
> > > Hi Chunxuan,
> > >
> > > On Sunday, November 10, 2013 3:19:44 PM, shao chunxuan wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi everyone,
> > > >
> > > > I am confused by the histogram of normalized Agilent
microarray
> > data.
> > > > It is human single color array, containing around 700
microarrays
> > and 43K probes.
> > > >
> > > > After normalization, I plotted the express value of all probes
in
> > single microarray, one example is attached.
> > > >
> > > > I
> > > > expected to see a more or less symmetric distribution,
however,
> > > > the values seems truncated. In the beginning I thought it may
> > relate
> > > > to offset value, but I have tried different value 16, 1, 0,
still
> > > > got similar distribution.
> > >
> > > Why would you expect a symmetric distribution? Also, plotting a
> > > histogram with such large bin sizes isn't very helpful - I
wouldn't
> > be
> > > willing to say much about the distribution based on that plot
anyway.
> > >
> > > A more reasonable expectation is something like a convolution of
a
> > > lognormal and an exponential distribution. In other words, there
are
> > > likely a large number of genes that aren't expressed, and the
> > > distribution of those probes will be symmetrical around some
small
> > > number. And the distribution of expressed genes is likely to be
> > > something like an exponential, with a long right tail. And since
you
> > > used the normexp background correction, you made the same
assumption
> > > as well.
> > >
> > > Best,
> > >
> > > Jim
> > >
> > >
> > > >
> > > > Any explanation or suggestions?
> > > >
> > > > Here are codes for normalization:
> > > > library(limma)
> > > > targets <- readTargets("targets.txt") x <-
read.maimages(targets,
> > > > source="agilent",green.only=TRUE) y.bg <- backgroundCorrect(x,
> > > > method="normexp") y.bgn <- normalizeBetweenArraysy.bg,
> > > > method="quantile") g.ex <- avereps(y.bgn,
ID=y.bgn$genes$ProbeName)
> > > > da.norm <- g.ex$E
> > > >
> > > > Here are R session:
> > > > R version 3.0.2 (2013-09-25)
> > > > Platform: x86_64-apple-darwin10.8.0 (64-bit)
> > > >
> > > > locale:
> > > > [1]
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > > >
> > > > attached base packages:
> > > > [1] graphics grDevices utils datasets stats methods
> > base
> > > >
> > > > other attached packages:
> > > > [1] ggplot2_0.9.3.1 reshape2_1.2.2 plyr_1.8
> > > >
> > > > loaded via a namespace (and not attached):
> > > > [1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.3
> > grid_3.0.2 gtable_0.1.2 labeling_0.2
> > > > [7] MASS_7.3-29 munsell_0.4.2 proto_0.3-10
> > RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2
> > > > Best,
> > > >
> > > > chunxuan
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor@r-project.org
> > > >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > Search the archives:
> > > >
http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > > --
> > > James W. MacDonald, M.S.
> > > Biostatistician
> > > University of Washington
> > > Environmental and Occupational Health Sciences
> > > 4225 Roosevelt Way NE, # 100
> > > Seattle WA 98105-6099
> >
> > [[alternative HTML version deleted]]
>
>
>
>
>
> The University of Aberdeen is a charity registered in Scotland, No
SC013683.
[[alternative HTML version deleted]]