Hi
As I have varying numbers of replicates, and they are not regularly
spaced on the array, and given that I would like a list of
differentially expressed genes which is averaged over replicates, I
assume the best thing to do is normalise my data, and then average
over
replicates in the MAList object, and then pass the averaged data to
lmFit() etc?
Is that right?
Cheers
Mick
> Hi
>
> As I have varying numbers of replicates, and they are not regularly
> spaced on the array, and given that I would like a list of
> differentially expressed genes which is averaged over replicates,
I assume that these are within-array replicates.
> I
> assume the best thing to do is normalise my data, and then average
over
> replicates in the MAList object, and then pass the averaged data to
> lmFit() etc?
Yes, you could do that. It does raise subtle issues though concerning
how the variance of the
averages depends on the number of replicates. You might like to
compute weights based on the
number of replicates for each probe and pass that to lmFit also.
Gordon
> Is that right?
>
> Cheers
> Mick
Thanks Gordon
Actually when I did this, I got some odd results.
If I ran lmFit(), eBayes() and topTable() on my data set on a per-spot
basis, I found ~800 SPOTS with a p-value <= 0.05. Now most of my
genes
are replicated in duplicate on the arrays (within-array replicates)
and
when I averaged over those replicates, and used that data to feed into
lmFit(), eBayes() and topTable() I got ~1100 GENES with a p-value
<=0.05.
Does this suggest that after averaging over replicate spots, the
measurements for my genes are more tightly distributed than the
individual spots were..?
Cheers
Mick
-----Original Message-----
From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU]
Sent: 01 September 2004 23:12
To: michael watson (IAH-C)
Cc: bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] Unequally spaced replicates in limma
> Hi
>
> As I have varying numbers of replicates, and they are not regularly
> spaced on the array, and given that I would like a list of
> differentially expressed genes which is averaged over replicates,
I assume that these are within-array replicates.
> I
> assume the best thing to do is normalise my data, and then average
> over replicates in the MAList object, and then pass the averaged
data
> to
> lmFit() etc?
Yes, you could do that. It does raise subtle issues though concerning
how the variance of the averages depends on the number of replicates.
You might like to compute weights based on the number of replicates
for
each probe and pass that to lmFit also.
Gordon
> Is that right?
>
> Cheers
> Mick
At 07:23 PM 2/09/2004, michael watson (IAH-C) wrote:
>Thanks Gordon
>
>Actually when I did this, I got some odd results.
The results look to me as you would hope for and expect.
>If I ran lmFit(), eBayes() and topTable() on my data set on a per-
spot
>basis, I found ~800 SPOTS with a p-value <= 0.05. Now most of my
genes
>are replicated in duplicate on the arrays (within-array replicates)
and
>when I averaged over those replicates, and used that data to feed
into
>lmFit(), eBayes() and topTable() I got ~1100 GENES with a p-value
><=0.05.
>
>Does this suggest that after averaging over replicate spots, the
>measurements for my genes are more tightly distributed than the
>individual spots were..?
1. You've reduced the number of genes by half, hence you do only half
the
adjustment for multiple testing, hence you end up with lower p-values.
2. You'd certainly hope that averages are more tightly distributed
than the
individual spots, that's why averaging is a good thing.
If your genes are virtually all in duplicate, and the others have an
even
number of reps, you could sort your MA object by gene ID and then use
duplicateCorrelation() with ndups=2 and spacing=1.
Gordon
>Cheers
>Mick
>
>-----Original Message-----
>From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU]
>Sent: 01 September 2004 23:12
>To: michael watson (IAH-C)
>Cc: bioconductor@stat.math.ethz.ch
>Subject: Re: [BioC] Unequally spaced replicates in limma
>
>
> > Hi
> >
> > As I have varying numbers of replicates, and they are not
regularly
> > spaced on the array, and given that I would like a list of
> > differentially expressed genes which is averaged over replicates,
>
>I assume that these are within-array replicates.
>
> > I
> > assume the best thing to do is normalise my data, and then average
> > over replicates in the MAList object, and then pass the averaged
data
> > to
> > lmFit() etc?
>
>Yes, you could do that. It does raise subtle issues though
concerning
>how the variance of the averages depends on the number of replicates.
>You might like to compute weights based on the number of replicates
for
>each probe and pass that to lmFit also.
>
>Gordon
>
> > Is that right?
> >
> > Cheers
> > Mick
Hi Gordon,
Is the solution of sorting the table available in LimmaGUI? Should I
resort
the input files to get the replicates taken into account using ndups=2
and
spacing=1? What happens to the replicates if you have no spot
weighting, are
they just averaged?
Thank you for your help,
Liz
------------------------------
Date: Thu, 02 Sep 2004 19:44:34 +1000
From: Gordon Smyth <smyth@wehi.edu.au>
Subject: RE: [BioC] Unequally spaced replicates in limma
To: "michael watson (IAH-C)" <michael.watson@bbsrc.ac.uk>
Cc: bioconductor@stat.math.ethz.ch
Message-ID: <6.0.1.1.1.20040902193610.02984088@imaphost.wehi.edu.au>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 07:23 PM 2/09/2004, michael watson (IAH-C) wrote:
>Thanks Gordon
>
>Actually when I did this, I got some odd results.
The results look to me as you would hope for and expect.
>If I ran lmFit(), eBayes() and topTable() on my data set on a per-
spot
>basis, I found ~800 SPOTS with a p-value <= 0.05. Now most of my
genes
>are replicated in duplicate on the arrays (within-array replicates)
and
>when I averaged over those replicates, and used that data to feed
into
>lmFit(), eBayes() and topTable() I got ~1100 GENES with a p-value
><=0.05.
>
>Does this suggest that after averaging over replicate spots, the
>measurements for my genes are more tightly distributed than the
>individual spots were..?
1. You've reduced the number of genes by half, hence you do only half
the
adjustment for multiple testing, hence you end up with lower p-values.
2. You'd certainly hope that averages are more tightly distributed
than the
individual spots, that's why averaging is a good thing.
If your genes are virtually all in duplicate, and the others have an
even
number of reps, you could sort your MA object by gene ID and then use
duplicateCorrelation() with ndups=2 and spacing=1.
Gordon
>Cheers
>Mick
>