Entering edit mode
Ruppert Valentino
▴
270
@ruppert-valentino-1376
Last seen 10.3 years ago
Hi James,
Thanks for the reply. Regarding non-specific
filtering of
genes I wonder if you or anyone else can comment on filters that can
identify "noisy" genes i.e. those that don't vary greatly across
samples. I
have very small data sets and SD is not good to filter genes as it
will
eliminate useful genes as well. Also I don't have a normal (reference)
group
to use for identify nonvariant genes.
There are papers discussing the use of Independent Compoment Analysis
(ICA)
to gene expression data and claiming that it filters genes non-
specifically.
Also I wonder if using eBayes to filter out invariant genes is good
idea
here. Finally what about taking the common set of genes in the top 25%
percentile from applying 3 normalisations; mas5, rma and dchip?
Finally because the groups are small does anyone have good way to test
the
stability of the clustering?
Any thoughts on those is greatly appreciated as carrying out
unsupervised
clustering seems to be a difficult problem due to the inherent
biological
noise that exist in samples from the same tissue.
Many thanks
Ruppert
>From: "James W. MacDonald" <jmacdon at="" med.umich.edu="">
>To: Ruppert Valentino <ruppert7 at="" hotmail.com="">
>CC: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] What is the best way to eliminate non-variants
from set
>of arrays?
>Date: Wed, 27 Jun 2007 08:54:08 -0400
>
>Hi Ruppert,
>
>Ruppert Valentino wrote:
>>Hello,
>>
>>I am analysing Affymatrix microarray experiment that involve the
following
>>groups :
>>
>>Group No of samples
>>--------- --------------------
>>
>>A 4
>>
>>B 9
>>
>>C 10
>>
>>D 2
>>
>>
>>
>>
>>I would like to get rid of the non-variants to do unsupervised
clustering.
>>I tried the simple filters like SD and fold change as in the Cluster
>>software but I always end up getting some of the technical probes
like
>>GAPDH Affy coming come up and spoil the cluster. So the question is
what
>>is the best algorithm to use to eliminate non-variant across the
arrays
>>non-specifically i.e. without grouping them?
>
>It seems to me that there are two questions here. First, how best to
filter
>probesets agnostically, and second, why do these technical probes not
get
>filtered out?
>
>For filtering the probes, I usually prefer to filter based on
variance (or
>SD if you like). This is as agnostic as you can get, and has the
desired
>effect of eliminating probesets that don't change expression. Others
seem
>to like using the P/M/A calls, which is another agnostic measure of
likely
>signal in the data. I think both should do a reasonable job.
>
>The second question is the more interesting IMO. It is always sort of
>embarrassing to give someone a list of genes where one of the top
genes is
>one of the Affy control probesets. In some sense it looks like you
weren't
>competent enough to 'get rid of' something that obviously shouldn't
be
>there. Or should it?
>
>Having a control probeset show up as significant doesn't necessarily
mean
>that something went wrong in the filtering step. For instance, GAPDH
is
>widely considered to be a housekeeping gene, but if you were
comparing
>samples that had widely different levels of glycolysis, you might
actually
>expect a difference in the expression of this gene.
>
>Of course, this only applies to the control probesets that
interrogate a
>gene that actually exists in the species you are working with. If
say, BioB
>were differentially expressed, then you might have a technical
problem with
>the way the chips were run that you might want to explore.
>
>Anyway, rather than simply trying to get rid of something you think
>shouldn't be there, you might think about why it isn't going away
when you
>filter, and think about what that might mean for this particular
>experiment.
>
>Best,
>
>Jim
>
>
>>
>>I was thinking of using dChip or eBayes but any suggestion/advice
would be
>>greatly appreciated as the sample size is small here and the idea is
to
>>just to eliminate non-variant genes to see if the unsupervised
clustering
>>brings anything.
>>
>>Regards
>>
>>Ruppert
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>**********************************************************
>Electronic Mail is not secure, may not be read every day, and should
not be
>used for urgent or sensitive issues.