Hi,
I have a question regarding the Illumina Human Methylation 450k array
and the genefilter package.
I used the 'nsfilter' function in gene filter to remove probes that
have low variance across samples. When I checked the documentation for
nsfilter, I found out that applying the function removes 50% of the
probes by default.
I computed the variance for each probe in the remaining probes and for
the removed probes separately. When I plot the density for each set of
variances, they overlap completely showing that both sets have most of
their probes with variance close to zero and few with high variance.
This leaves me wondering how nsfilter actually filters probes, as it
doesn't appear from the plot that the probes with the lowest variances
are removed.
What would be the best way to filter out low variance probes in 450k
data? If the default value in nsfilter is set to 50% assuming that 40%
of genes in a cell are not expressed, what percentage cutoff can be
used for methylation data?
Would be great if anyone can explain it.
Thanks,
Khadeeja
[[alternative HTML version deleted]]
Genomic DNA -- what you're assaying on these arrays, or at least what
they're designed for -- need not be expressed.
It's just... there, chopped up after extraction, bisulfite conversion,
and
whole-genome amplification, waiting to hybridize.
Thus nsfilter's fundamental assumption -- that some large fraction of
the
probes on the array are in fact pure noise -- is violated. It may be
that
there is (almost always) local correlation between probes within +/-
1kb of
each other, but if the protocols for these arrays are followed
carefully,
you can expect better than 99% of the probes to hybridize (which is
NOT the
case with expression arrays, and you would not expect 99% of the
genome to
align in an RNAseq experiment either). So the decision of how many
probes
to retain then comes down to your judgment.
Biological annotation (e.g. from ChIP-seq peak calls for histone
marks,
transcription factors, or physical interactions) can become very
useful in
making sense of these data. If you lack normal samples (or don't know
which ones are "normal") it is possible to see low variability in
regions
which are consistently aberrant, so that may not always be the best
approach. I find the GenomicRanges, GenomicFeatures, and rtracklayer
packages useful for this type of annotation, FWIW.
Hope this helps,
--t
On Thu, Feb 9, 2012 at 2:17 PM, khadeeja ismail <hajjja@yahoo.com>
wrote:
> Hi,
>
> I have a question regarding the Illumina Human Methylation 450k
array and
> the genefilter package.
> I used the 'nsfilter' function in gene filter to remove probes that
have
> low variance across samples. When I checked the documentation for
nsfilter,
> I found out that applying the function removes 50% of the probes by
> default.
> I computed the variance for each probe in the remaining probes and
for the
> removed probes separately. When I plot the density for each set of
> variances, they overlap completely showing that both sets have most
of
> their probes with variance close to zero and few with high variance.
> This leaves me wondering how nsfilter actually filters probes, as it
> doesn't appear from the plot that the probes with the lowest
variances are
> removed.
> What would be the best way to filter out low variance probes in 450k
data?
> If the default value in nsfilter is set to 50% assuming that 40% of
genes
> in a cell are not expressed, what percentage cutoff can be used for
> methylation data?
> Would be great if anyone can explain it.
>
> Thanks,
> Khadeeja
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]