Entering edit mode
Nathan Harmston
▴
100
@nathan-harmston-2904
Last seen 10.6 years ago
Hi everyone,
Currently the aim of a project I'm working on is to discover pathway
signatures (and I am thinking about using an approach like GSEA using
KEGG
or GO or something more modular). I have seen in some
vignettes/tutorials
that they recommend reducing the number of probes per gene to one by
retaining the probe with the most variation since it will be the most
informative. However, would it not be best to take the probe which is
closest to the polyA tail of the gene, which according to some sources
(in
the lab I'm working at) is the most reliable probe in the gene? Is
there a
good reason for choosing variability over reliability, I have done a
quick
look through some papers and been unable to find any information which
would
point me towards one or another (apart from the bioC vignettes).
Another problem I was wondering about is trying to deal with the
multiple
locations per probe problem? I was wondering if a BioConductor package
was
available for this, since it seems like a frequent issue with
microarray
analysis. How would you actually deal with this problem, my current
approach
is too remove probes which hit to multiple locations on the genome (I
have a
list from http://microarray.csc.mrc.ac.uk/scampa/section.html?id=5 and
was
going to use nsFilter (if I get it working correctly)). But again this
seems
like a lot of information is thrown away, is there a good way of
dealing
with these probes which doesnt result in a throwing away of
information?
Out of interest, how reliable is the annotation provided? Is it
completely
derived from the affy annotations. The number of probes where affy
entrez id
and the ensemblid match is approx 30000, which isn't that great a
statistic.
How do people tend to deal with problems like this?
Sorry for the multiple questions in one post, but I think they are all
related to each other.
Many thanks in advance,
Nathan
simultaneously loving and hating R at the same time
[[alternative HTML version deleted]]