Entering edit mode
Yuan Hao
▴
30
@yuan-hao-4071
Last seen 10.4 years ago
Dear list,
May I have a question about the non-specific filtering used for
defining a
gene universe during HyperGeometric/GSEA test?
I have fifteen samples from Affymetrix. To remove probe sets that have
little variation across samples, I evaluated IQR of each probe set
across
samples by either of the following two pieces of code:
# code one
> cutoff <- 0.5
> Iqr <- apply (exprs(eset), 1, IQR)
> selected <- (Iqr > cutoff)
> filtered <- eset[selected, ]
> dim(filtered)
Features Samples
11490 15
# code two
> library(genefilter)
> filtered<-varFilter(eset, var.func=IQR, var.cutoff=0.5,
filterByQuantile=TRUE)
> dim(filtered)
Features Samples
27337 15
I realized the differences in "filtered" given by above two methods
may
come from the different definitions of IQR. In the first case, IQR was
computed by using the 'quantile' function rather than Tukey's format:
?IQR(x) = quantile(x,3/4) - quantile(x,1/4)?, which was used in the
second
case. I am aware the fact that the number of genes in the gene
universe
would has significant effects on the test result. However, I am not
sure
which IQR evaluation method will be a better choice for the
HyperGeometric/GSEA test? It would be appreciated very much if you
could
shed some light on it!
Regards,
Yuan