Question

arrayQualityMetrics R for outlier detection

0

Entering edit mode

lily ▴ 20

@lily-11438

Last seen 3.9 years ago

India

Hi I have used arrayqualitymetrics R package for outlier detection and different threshold were determined for different microarray datasets. For example, box plot outlier detection, a threshold of 0.163 was determined for one microarray dataset and another a threshold of 0.263 was determined. What can be the possible reason and can I keep a single threshold for all datasets?

microarray outlier detection • 845 views

ADD COMMENT • link updated 5.2 years ago by Gordon Smyth 52k • written 5.2 years ago by lily ▴ 20

score 0 · Answer 1 · 2020-02-14

I can't speak directly to what arrayQualityMetrics is doing, as you are just making general statements about something you have done.

However, if you think about what outlier means, you can see that choosing a fixed value is not a good idea. In other words, to say if a particular sample is an outlier, you need to first estimate the within group variability of the group the sample is assumed to belong to. Only then can you say if a sample is an outlier, because it is more different from the rest of the group than you expect, given the within-group variability that you have estimated.

So if you have a group of samples that are all very similar, it won't take much for you to consider a sample to be an outlier. But if you have a group that is really quite variable, it will take a much larger difference before you will think it's not just expected variation, but excessive variation.

score 0 · Answer 2 · 2020-02-15

You have asked questions previously about the limma package, so it may be helpful to point out that limma's arrayWeights function allows you to downweight noisy samples in a quantitative way. In a limma analysis you don't need to make a hard decision about whether to keep or discard a sample, instead you can keep borderline samples in the analysis but downweight them accordingly.