marco zucchelli wrote:
> Hi,
>
> I am using hyperGTest to test GO. Fot the universe of genes I use
the
> follwing:
>
> ENTREZ <- as.list(hgu133plus2ENTREZID)
> univ <- unlist(ENTREZ)
> univ <- univ[!is.na(univ)]
>
> now in univ there are 47,430 genes but only 19,871 are unique, since
the
> same gene can be hybrydized several times on the same array.
>
>
>>length(univ)
>
> [1] 47430
>
>>length(unique(univ))
>
> [1] 19871
>
>
> Is it correct to have repetitions or should a list of unique genes
be used?
> i.e. should I use:
>
> universeGeneIds=univ or
> universeGeneIds=unique(univ)
You want the unique genes. In addition, if you have done any
pre-filtering of the data to remove e.g., those genes that don't
change
expression, you want to remove those from your universe as well. The
universe should only consist of unique genes that could have been
selected by whatever statistical test you used. BTW, the geneIds
should
also be unique.
Seth has made some changes to the GOstats vignette that should make
all
of this quite clear:
http://www.bioconductor.org/packages/2.0/bioc/vignettes/GOstats/inst/d
oc/GOstatsHyperG.pdf
Best,
Jim
>
> in the GOHyperGParams ??
>
> Regards
>
> Marco
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
"James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes:
> You can test for under-represented GO terms by setting the
testDirection
> argument of your GOHyperGParams object to "under". So if your
> GOHyperGParams object were set up like this:
>
> params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL)
>
> then you would test for over representation as normal:
>
> hyperGTest(p)
>
> and under-representation like this:
>
> testDirection(params) <- "under"
make that: testDirection(p) <- "under"
and then:
> hyperGTest(p)
I knew what you meant, but in case someone thinks "params" is a magic
word...
:-)
+ seth
Seth Falcon wrote:
> "James W. MacDonald" <jmacdon at="" med.umich.edu=""> writes:
>
>>You can test for under-represented GO terms by setting the
testDirection
>>argument of your GOHyperGParams object to "under". So if your
>>GOHyperGParams object were set up like this:
>>
>>params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL)
>>
>>then you would test for over representation as normal:
>>
>>hyperGTest(p)
>>
>>and under-representation like this:
>>
>>testDirection(params) <- "under"
>
>
> make that: testDirection(p) <- "under"
>
> and then:
>
>
>>hyperGTest(p)
>
>
>
> I knew what you meant, but in case someone thinks "params" is a
magic
> word...
This just goes to show the power of the vignette. I usually call my
GOHyperGParams objects 'params', but reverted to what you call it in
your vignette, right in the middle of my example. Now if I could just
get my computer to be able to figure out what I mean when I mix things
up like that. ;-D
Best,
Jim
>
> :-)
>
> + seth
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
marco zucchelli wrote:
> Sorry, I missed to reply to bioconductor ...
Thanks for re-sending to the list.
>
> Dear James,
>
> actually I tried both the options and got the same result, which
probably
> menas that the hyperGTest is taking care of the repeats.
True. It does that now. In addition, in the new version it does some
validity checking to ensure that e.g., the geneId is a subset of the
universeGeneIds.
>
> I use hgu133plus2.
>
> The pre-filtering is pretty interesting argument. In my case I have
2
> biological replicates for each tissue and I split the probes based
on the
> following rule:
Here I am assuming that you mean probesets when you say probes.
>
> 1.probes that are called absent on all the arrays (group A)
> 2.probes that have consistent calls on the biological replicates
(i.e. they
> are either present or moderate or absent on both the replicates) but
at
> least once they are present on a pair of replicates (group B)
> 3. others. these are probes that may be absent on one replicate and
present
> on the other etc. (group C)
>
> now I find differentially expressed gened with limma in group B and
I
> cluster them according to the pattern of up-down regulation they
have on
> the
> different tissues.
>
> I am intrested also in genes that are always absent (group A) since
this
> may
> be biologically of interest in my experiment.
>
> What is now a good universe to use?
>
> I filtered out probes that have no entrezID both in my gene lists
and the
> hgu133plus2 array, I got the entrezId's and eliminated the doubles
by using
> unique().
>
> Then I used:
>
> 1. geneIds = A , universeGenes = ALL but seems
like I
> should use universeGenes = A+B ??
For this group, the statistic you used to select 'significant'
probesets
was the P/A calls, and you selected from all the genes on the chip, so
I
think your universe is correct.
> 2. geneIds = cluster of B , universeGenes = ALL but seems like I
should
> use universeGenes = A+B ?? or only B?
Here you pre-filtered the probesets and selected only those that
fulfilled your 'B' criterion. You then used a linear model to select
differentially expressed genes from this subset of the chip, so IMO,
your universe is the 'B' genes.
>
> Does it make sense to look at unexpressed genes ? seems like this
might be
> not very wise on the affy arrays.
I think you might get ten different answers from ten different
statisticians if you asked them about what it means when a probeset
has
an 'absent' call.
I am personally not convinced that the P/M/A calls mean much, mainly
because a large percentage (30 - 40%) of the MM probes are brighter
than
their matching PM probe. It seems to me that the MM probes capture an
unknown mixture of noise, transcript abundance, background binding,
and
binding of unexpected transcript (e.g., an unrelated transcript for
which the MM probe is a PM that Affy didn't know about).
The assumption for the P/M/A calls is that the MM probe only captures
noise and background binding, so if the PM probes are significantly
brighter, then the transcript is expressed. In many cases this may
well
be true. However, I think there are likely enough cases where this
isn't
true that I am not comfortable assuming that an absent call means the
gene really isn't being expressed.
That said, by looking at the 'unexpressed' genes are you really trying
to figure out which pathways (for lack of a better term) are being
shut
down, or not used? If that is the case, then maybe you could use the B
genes and look for _underrepresented_ GO terms as well. This might
give
you what you want without having to make any assumptions about which
genes are not being expressed.
You can test for under-represented GO terms by setting the
testDirection
argument of your GOHyperGParams object to "under". So if your
GOHyperGParams object were set up like this:
params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL)
then you would test for over representation as normal:
hyperGTest(p)
and under-representation like this:
testDirection(params) <- "under"
hyperGTest(p)
HTH,
Jim
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.