Entering edit mode
Hello all,
I'm new to R/BioC, but I've been trying to use them for the following
analysis. I appologize if this email is a bit long, but I bet someone
in
this list could point me in the right direction.
I have the GNF dataset with Affy expression data from 61 mouse tissues
(each with 1 biological replicate, 122 total CEL files)
In the end I would like to obtain, for each tissue, the gene list
sorted
according to the specificity of their expression in that tissue. That
is,
genes whose expression is highest in that tissue, relative to the
other
tissues (although their absolute expression levels could be low) at
the
top, and genes whose expession is lowest in that tissue (although
their
absolute expression levels could be high) at the bottom. Ideally, I
would
like to have some confidence value (p-value?) associated to each gene
as
well.
Initially, I downloaded the pre-normalized (with MAS or gcRMA) files,
and
did all the manipulation with perl scripts. For each probe X, I took
its
expression values Xi (i = 1..61) for each tissue, and substituted the
expression value for (Xi - mean(x))/ std_dev(x), essentially a
Z-score. In
this way, the "Z-score" represents how specifically expressed a
particular
gene is in a particular tissue, considering the std_dev of the
expression
levels of that gene.
One of the first problems with this, is that I am only processing a
subset
of the probes, since I only use those with a RefSeq transcript. So I
thought it would be better to re-normalize everything considering only
the
subset of the transcripts that I will be analyzing. Is this correct?
I think for my particular case I'm better off with a RMA/gcRMA
summary. I
can see that I can use the "subset" parameter to select only the
probesets
I want. Also, I can't make much use of the A/M/P calls of MAS
analysis,
since I don't want the low-expression values to be cut off. I read a
couple of papers where they compared these and other methods, and
decided
to initially try gcRMA.
I guess my main questions are, other than trying to get general
suggestions:
1) at what point do I use the biological replicates?
2) is there a package that I can use to obtain "relative" expression
levels, among all the tissues? I can find many examples of how to get
relative expression levels when comparing two cases (or a few more,
but
always comparing in pairs). How can I best compare each tissue to "all
the
rest"?
Thanks for your time,
Cei