Dear Paul,
I have noticed cases where the results are 'better' (i.e. you get more
extreme moderated t-statistics or log-odds) if you remove suspect
arrays.
In one recent example I recall, the experimenter eventually discovered
that
the genotype of a sample hybridised to one of their arrays was not
what they
originally thought. This meant that the linear model they were
fitting was
not right. Although the weight assigned to this array was small,
removing
it from the analysis altogether still produced better results. The
array
weights method cannot correct for these kinds of gross errors.
I usually take a try it and see approach in my own analyses, similar
to what
you have done (i.e. run the analysis with equal weights, with array
weights,
or after removing any suspect arrays altogether, then look at the
results of
each to see which gives the most extreme statistics).
Best wishes,
Matt
> I use limma quite a bit but have not really been using arrayWeights
> much, until recently.
> I like it a lot but have found, in some cases, that it appears
better
> just ditch the very poorly performing arrays..and then I proceed
without
> weighing .
>
> What are peoples real world experience with arrayWeights, are you
using
> it routinely ?
>
> For example my typical usage... time series with biological
triplicates
>
>> design
> t0hr t6hr t24hr t24p6hr
> 1 1 0 0 0
> 2 1 0 0 0
> 3 1 0 0 0
> 4 0 1 0 0
> 5 0 1 0 0
> 6 0 1 0 0
> 7 0 0 1 0
> 8 0 0 1 0
> 9 0 0 1 0
> 10 0 0 0 1
> 11 0 0 0 1
> 12 0 0 0 1
>
> arrayw<-arrayWeights(selDataMatrix,design=design)
>> arrayw
> 1 2 3 4 5 6
7
> 8 9
> 1.6473168 1.2716081 1.5170375 1.0310794 1.1010048 1.2787543
0.8198722
> 0.7162097 2.3992850
> 10 11 12
> 0.1744961 1.3821469 0.6379648 ## note array 10: which was a
outlier in
> hierarchical clustering (though was still more similar to arrays its
> biological replicates than any other arrays (based on genes where
> sd/mean> 0.1)..
>
> fit <- lmFit(selDataMatrix, design,weights=arrayw)
> fit <- lmFit(selDataMatrix, design)
>
> cont.matrix <- makeContrasts(
> tchange6hr="t6hr-t0hr" ,
> tchange24hr="t24hr-t0hr" ,
> tchange24p6hr="t24p6hr-t0hr" ,
> diff0to6="t6hr-t0hr" ,
> diff6to24="t24hr-t6hr" ,
> diff24to24p6="t24p6hr-t24hr" ,
> levels=design)
>
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
> ** Get
>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2927
>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
> [1] 5263
>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2083
>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2927
>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2931
>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
> [1] 3810
>
> ####################### AS APPOSED TO THE TYPICAL:
>
> fit <- lmFit(selDataMatrix, design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
> ** Get
>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1725
>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
> [1] 3438
>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1512
>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1725
>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1605
>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2610
>
> Is more differential expression better .. always... I guess so
unless
> there are more false positives right? I am slightly worried that in
> using a linear model to access array quality and produce weights ,
that
> this will then naturally bias a method such as limma that then
using a
> linear model, again, to determine differential expression. After
trying
> a few different permutations (use weights, remove "worst" arrays and
> redo without weights) that this is not a big concern but would
welcome
> some feedback from others and insight into how they are using this
> function .
>
> Thanks
> Paul