Hi. I am using Roast to perform gene set enrichment analysis after
doing differential expression analysis in edgeR. In this particular
study, I had 2 variables which I joined to create a single factor in
the linear model: y ~ 0 + combo_variable. Looking at the variance in
the data, most of the variance was due to 1 variable and there was
very little variance due to the other variable. Thus, large numbers
of genes (~10,000) were found to be differentially expressed when
testing the contrast for one variable and very few genes (~200) were
found to be differentially expressed when testing the contrast for the
other variable. When I ran roast for each of these 2 contrasts, the
one that had lots of differentially expressed genes found almost all
of the gene sets to be enriched. This is understandable since there
were lots of genes differentially expressed, but my problem is that
most of the gene sets had the same FDRs. Thus I can't even narrow
down the list of enriched gene sets by using a more stringent FDR
cutoff. A subset of the output is shown below. Why would all of
these gene sets have the same p-values and thus the same FDRs??
NGenes
PropDown
PropUp
Direction
PValue
FDR
PValue.Mixed
FDR.Mixed
112
0.455357
0.205357
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
38
0.578947
0.210526
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
10
0.2
0.4
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
311
0.299035
0.469453
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
540
0.233333
0.344444
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
1294
0.328439
0.257342
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
317
0.29653
0.533123
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
538
0.421933
0.256506
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
133
0.511278
0.293233
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
39
0.589744
0.205128
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
14
0.214286
0.571429
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
13
0.307692
0.538462
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
36
0.472222
0.222222
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
616
0.160714
0.688312
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
6
1
0
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
21
0.428571
0.285714
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
65
0.415385
0.246154
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
99
0.383838
0.323232
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
19
0
0.578947
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
118
0.5
0.313559
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
470
0.461702
0.323404
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
1401
0.404711
0.250535
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
631
0.272583
0.369255
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
55
0.236364
0.472727
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
5
0.2
0.6
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
On the other hand, when I ran roast for the contrast with few genes
differentially expressed, I got few gene sets enriched. But what's
odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but
none of the genes in the gene set were differentially expressed. Are
these enriched gene sets false positives? I'm not sure what's going
on here.
NGenes
PropDown
PropUp
Direction
PValue
FDR
PValue.Mixed
FDR.Mixed
# DE genes
18
0.333333
0
Down
2.00E-04
0.01636
4.00E-04
0.024994
0
5
0.4
0
Down
2.00E-04
0.01636
5.00E-04
0.024994
0
7
0.714286
0
Down
6.00E-04
0.045444
6.00E-04
0.024994
0
50
0.08
0.32
Up
0.013
0.159882
0.001
0.032379
0
49
0.346939
0.142857
Down
0.049
0.272132
0.001
0.032379
0
4
0.25
0.25
Down
0.1722
0.457071
0.0012
0.037628
0
Please advise.
Thanks,
Julie
Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.
This message may contain confidential information. If
yo...{{dropped:7}}
Same questions minus the unformatted table data.
Hi. I am using Roast to perform gene set enrichment analysis after
doing differential expression analysis in edgeR. In this particular
study, I had 2 variables which I joined to create a single factor in
the linear model: y ~ 0 + combo_variable. Looking at the variance in
the data, most of the variance was due to 1 variable and there was
very little variance due to the other variable. Thus, large numbers
of genes (~10,000) were found to be differentially expressed when
testing the contrast for one variable and very few genes (~200) were
found to be differentially expressed when testing the contrast for the
other variable. When I ran roast for each of these 2 contrasts, the
one that had lots of differentially expressed genes found almost all
of the gene sets to be enriched. This is understandable since there
were lots of genes differentially expressed, but my problem is that
most of the gene sets had the same FDRs. Thus I can't even narrow
down the list of enriched gene sets by using a more stringent FDR
cutoff. Why would all of these gene sets have the same p-values and
thus the same FDRs??
On the other hand, when I ran roast for the contrast with few genes
differentially expressed, I got few gene sets enriched. But what's
odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but
none of the genes in the gene set were differentially expressed. Are
these enriched gene sets false positives? I'm not sure what's going
on here.
Please advise.
Thanks,
Julie
Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.
-----Original Message-----
From: bioconductor-bounces@r-project.org [mailto:bioconductor-
bounces@r-project.org] On Behalf Of julie.leonard@syngenta.com
Sent: Thursday, January 23, 2014 5:28 PM
To: bioconductor at r-project.org
Subject: [BioC] 2 issues about enriched gene sets via Roast
Hi. I am using Roast to perform gene set enrichment analysis after
doing differential expression analysis in edgeR. In this particular
study, I had 2 variables which I joined to create a single factor in
the linear model: y ~ 0 + combo_variable. Looking at the variance in
the data, most of the variance was due to 1 variable and there was
very little variance due to the other variable. Thus, large numbers
of genes (~10,000) were found to be differentially expressed when
testing the contrast for one variable and very few genes (~200) were
found to be differentially expressed when testing the contrast for the
other variable. When I ran roast for each of these 2 contrasts, the
one that had lots of differentially expressed genes found almost all
of the gene sets to be enriched. This is understandable since there
were lots of genes differentially expressed, but my problem is that
most of the gene sets had the same FDRs. Thus I can't even narrow
down the list of enriched!
gene sets by using a more stringent FDR cutoff. A subset of the
output is shown below. Why would all of these gene sets have the
same p-values and thus the same FDRs??
NGenes
PropDown
PropUp
Direction
PValue
FDR
PValue.Mixed
FDR.Mixed
112
0.455357
0.205357
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
38
0.578947
0.210526
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
10
0.2
0.4
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
311
0.299035
0.469453
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
540
0.233333
0.344444
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
1294
0.328439
0.257342
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
317
0.29653
0.533123
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
538
0.421933
0.256506
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
133
0.511278
0.293233
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
39
0.589744
0.205128
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
14
0.214286
0.571429
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
13
0.307692
0.538462
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
36
0.472222
0.222222
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
616
0.160714
0.688312
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
6
1
0
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
21
0.428571
0.285714
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
65
0.415385
0.246154
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
99
0.383838
0.323232
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
19
0
0.578947
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
118
0.5
0.313559
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
470
0.461702
0.323404
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
1401
0.404711
0.250535
Down
2.00E-04
0.0002
1.00E-04
5.50E-05
631
0.272583
0.369255
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
55
0.236364
0.472727
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
5
0.2
0.6
Up
2.00E-04
0.0002
1.00E-04
5.50E-05
On the other hand, when I ran roast for the contrast with few genes
differentially expressed, I got few gene sets enriched. But what's
odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but
none of the genes in the gene set were differentially expressed. Are
these enriched gene sets false positives? I'm not sure what's going
on here.
NGenes
PropDown
PropUp
Direction
PValue
FDR
PValue.Mixed
FDR.Mixed
# DE genes
18
0.333333
0
Down
2.00E-04
0.01636
4.00E-04
0.024994
0
5
0.4
0
Down
2.00E-04
0.01636
5.00E-04
0.024994
0
7
0.714286
0
Down
6.00E-04
0.045444
6.00E-04
0.024994
0
50
0.08
0.32
Up
0.013
0.159882
0.001
0.032379
0
49
0.346939
0.142857
Down
0.049
0.272132
0.001
0.032379
0
4
0.25
0.25
Down
0.1722
0.457071
0.0012
0.037628
0
Please advise.
Thanks,
Julie
Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.
This message may contain confidential information. If
yo...{{dropped:17}}
Dear Julie,
When you do a rotation or permutation test, the smallest possible
p-value
that can be achieved depends on the number of distinct rotations or
permutations that have been performed. You appear to be using
nrot=9999
rotations, so the smallest one-sided p-value that is possible is
p = 1 / (nrot+1) = 1e-4
The small possible two-sided p-value possible therefore is twice this,
which is 2e-4.
Any gene set that contains lots of differential expression, so that
the
observed statistic is greater than any of the rotated statistics, will
be
assigned this minimum p-value.
One can resolve these small p-values further by doing more rotations.
The
more rotations than are done, the fewer gene sets will sit on the
minimum.
Best wishes
Gordon
PS. If you're not sure where the above p-value formula comes from, see
Section 4 of:
http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf
> Date: Thu, 23 Jan 2014 17:27:46 -0500
> From: <julie.leonard at="" syngenta.com="">
> To: <bioconductor at="" r-project.org="">
> Subject: [BioC] 2 issues about enriched gene sets via Roast
>
> Hi.
> I am using Roast to perform gene set enrichment analysis after doing
> differential expression analysis in edgeR. In this particular
study, I
> had 2 variables which I joined to create a single factor in the
linear
> model: y ~ 0 + combo_variable. Looking at the variance in the
data,
> most of the variance was due to 1 variable and there was very little
> variance due to the other variable. Thus, large numbers of genes
> (~10,000) were found to be differentially expressed when testing the
> contrast for one variable and very few genes (~200) were found to be
> differentially expressed when testing the contrast for the other
> variable. When I ran roast for each of these 2 contrasts, the one
that
> had lots of differentially expressed genes found almost all of the
gene
> sets to be enriched. This is understandable since there were lots
of
> genes differentially expressed, but my problem is that most of the
gene
> sets had the same FDRs. Thus I can't even narrow down the list of
> enriched! gene sets by using a more stringent FDR cutoff. A subset
of
> the output is shown below. Why would all of these gene sets have
the
> same p-values and thus the same FDRs??
>
> NGenes
>
> PropDown
>
> PropUp
>
> Direction
>
> PValue
>
> FDR
>
> PValue.Mixed
>
> FDR.Mixed
>
> 112
>
> 0.455357
>
> 0.205357
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 38
>
> 0.578947
>
> 0.210526
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 10
>
> 0.2
>
> 0.4
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 311
>
> 0.299035
>
> 0.469453
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 540
>
> 0.233333
>
> 0.344444
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 1294
>
> 0.328439
>
> 0.257342
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 317
>
> 0.29653
>
> 0.533123
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 538
>
> 0.421933
>
> 0.256506
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 133
>
> 0.511278
>
> 0.293233
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 39
>
> 0.589744
>
> 0.205128
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 14
>
> 0.214286
>
> 0.571429
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 13
>
> 0.307692
>
> 0.538462
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 36
>
> 0.472222
>
> 0.222222
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 616
>
> 0.160714
>
> 0.688312
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 6
>
> 1
>
> 0
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 21
>
> 0.428571
>
> 0.285714
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 65
>
> 0.415385
>
> 0.246154
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 99
>
> 0.383838
>
> 0.323232
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 19
>
> 0
>
> 0.578947
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 118
>
> 0.5
>
> 0.313559
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 470
>
> 0.461702
>
> 0.323404
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 1401
>
> 0.404711
>
> 0.250535
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 631
>
> 0.272583
>
> 0.369255
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 55
>
> 0.236364
>
> 0.472727
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 5
>
> 0.2
>
> 0.6
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
>
>
> On the other hand, when I ran roast for the contrast with few genes
differentially expressed, I got few gene sets enriched. But what's
odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but
none of the genes in the gene set were differentially expressed. Are
these enriched gene sets false positives? I'm not sure what's going
on here.
>
> NGenes
>
> PropDown
>
> PropUp
>
> Direction
>
> PValue
>
> FDR
>
> PValue.Mixed
>
> FDR.Mixed
>
> # DE genes
>
> 18
>
> 0.333333
>
> 0
>
> Down
>
> 2.00E-04
>
> 0.01636
>
> 4.00E-04
>
> 0.024994
>
> 0
>
> 5
>
> 0.4
>
> 0
>
> Down
>
> 2.00E-04
>
> 0.01636
>
> 5.00E-04
>
> 0.024994
>
> 0
>
> 7
>
> 0.714286
>
> 0
>
> Down
>
> 6.00E-04
>
> 0.045444
>
> 6.00E-04
>
> 0.024994
>
> 0
>
> 50
>
> 0.08
>
> 0.32
>
> Up
>
> 0.013
>
> 0.159882
>
> 0.001
>
> 0.032379
>
> 0
>
> 49
>
> 0.346939
>
> 0.142857
>
> Down
>
> 0.049
>
> 0.272132
>
> 0.001
>
> 0.032379
>
> 0
>
> 4
>
> 0.25
>
> 0.25
>
> Down
>
> 0.1722
>
> 0.457071
>
> 0.0012
>
> 0.037628
>
> 0
>
>
> Please advise.
>
> Thanks,
> Julie
>
>
> Julie Leonard
> Computational Biologist
> Global Bioinformatics
> Syngenta Biotechnology, Inc.
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Thanks!!!
Julie Leonard
Computational Biologist
Global Bioinformatics
Syngenta Biotechnology, Inc.
3054 E. Cornwallis Rd.
Research Triangle Park, NC 27709
USA
phone 1-919-281-7449
julie.leonard at syngenta.com
www.syngenta.com
-----Original Message-----
From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU]
Sent: Saturday, January 25, 2014 2:12 AM
To: Leonard Julie USRE
Cc: Bioconductor mailing list
Subject: 2 issues about enriched gene sets via Roast
Dear Julie,
When you do a rotation or permutation test, the smallest possible
p-value that can be achieved depends on the number of distinct
rotations or permutations that have been performed. You appear to be
using nrot=9999 rotations, so the smallest one-sided p-value that is
possible is
p = 1 / (nrot+1) = 1e-4
The small possible two-sided p-value possible therefore is twice this,
which is 2e-4.
Any gene set that contains lots of differential expression, so that
the observed statistic is greater than any of the rotated statistics,
will be assigned this minimum p-value.
One can resolve these small p-values further by doing more rotations.
The more rotations than are done, the fewer gene sets will sit on the
minimum.
Best wishes
Gordon
PS. If you're not sure where the above p-value formula comes from, see
Section 4 of:
http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf
> Date: Thu, 23 Jan 2014 17:27:46 -0500
> From: <julie.leonard at="" syngenta.com="">
> To: <bioconductor at="" r-project.org="">
> Subject: [BioC] 2 issues about enriched gene sets via Roast
>
> Hi.
> I am using Roast to perform gene set enrichment analysis after doing
> differential expression analysis in edgeR. In this particular
study,
> I had 2 variables which I joined to create a single factor in the
> linear
> model: y ~ 0 + combo_variable. Looking at the variance in the
data,
> most of the variance was due to 1 variable and there was very little
> variance due to the other variable. Thus, large numbers of genes
> (~10,000) were found to be differentially expressed when testing the
> contrast for one variable and very few genes (~200) were found to be
> differentially expressed when testing the contrast for the other
> variable. When I ran roast for each of these 2 contrasts, the one
> that had lots of differentially expressed genes found almost all of
> the gene sets to be enriched. This is understandable since there
were
> lots of genes differentially expressed, but my problem is that most
of
> the gene sets had the same FDRs. Thus I can't even narrow down the
> list of enriched! gene sets by using a more stringent FDR cutoff. A
> subset of the output is shown below. Why would all of these gene
sets
> have the same p-values and thus the same FDRs??
>
> NGenes
>
> PropDown
>
> PropUp
>
> Direction
>
> PValue
>
> FDR
>
> PValue.Mixed
>
> FDR.Mixed
>
> 112
>
> 0.455357
>
> 0.205357
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 38
>
> 0.578947
>
> 0.210526
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 10
>
> 0.2
>
> 0.4
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 311
>
> 0.299035
>
> 0.469453
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 540
>
> 0.233333
>
> 0.344444
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 1294
>
> 0.328439
>
> 0.257342
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 317
>
> 0.29653
>
> 0.533123
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 538
>
> 0.421933
>
> 0.256506
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 133
>
> 0.511278
>
> 0.293233
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 39
>
> 0.589744
>
> 0.205128
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 14
>
> 0.214286
>
> 0.571429
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 13
>
> 0.307692
>
> 0.538462
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 36
>
> 0.472222
>
> 0.222222
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 616
>
> 0.160714
>
> 0.688312
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 6
>
> 1
>
> 0
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 21
>
> 0.428571
>
> 0.285714
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 65
>
> 0.415385
>
> 0.246154
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 99
>
> 0.383838
>
> 0.323232
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 19
>
> 0
>
> 0.578947
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 118
>
> 0.5
>
> 0.313559
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 470
>
> 0.461702
>
> 0.323404
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 1401
>
> 0.404711
>
> 0.250535
>
> Down
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 631
>
> 0.272583
>
> 0.369255
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 55
>
> 0.236364
>
> 0.472727
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
> 5
>
> 0.2
>
> 0.6
>
> Up
>
> 2.00E-04
>
> 0.0002
>
> 1.00E-04
>
> 5.50E-05
>
>
>
> On the other hand, when I ran roast for the contrast with few genes
differentially expressed, I got few gene sets enriched. But what's
odd is it did find some gene sets enriched with FDR.Mixed < 0.05, but
none of the genes in the gene set were differentially expressed. Are
these enriched gene sets false positives? I'm not sure what's going
on here.
>
> NGenes
>
> PropDown
>
> PropUp
>
> Direction
>
> PValue
>
> FDR
>
> PValue.Mixed
>
> FDR.Mixed
>
> # DE genes
>
> 18
>
> 0.333333
>
> 0
>
> Down
>
> 2.00E-04
>
> 0.01636
>
> 4.00E-04
>
> 0.024994
>
> 0
>
> 5
>
> 0.4
>
> 0
>
> Down
>
> 2.00E-04
>
> 0.01636
>
> 5.00E-04
>
> 0.024994
>
> 0
>
> 7
>
> 0.714286
>
> 0
>
> Down
>
> 6.00E-04
>
> 0.045444
>
> 6.00E-04
>
> 0.024994
>
> 0
>
> 50
>
> 0.08
>
> 0.32
>
> Up
>
> 0.013
>
> 0.159882
>
> 0.001
>
> 0.032379
>
> 0
>
> 49
>
> 0.346939
>
> 0.142857
>
> Down
>
> 0.049
>
> 0.272132
>
> 0.001
>
> 0.032379
>
> 0
>
> 4
>
> 0.25
>
> 0.25
>
> Down
>
> 0.1722
>
> 0.457071
>
> 0.0012
>
> 0.037628
>
> 0
>
>
> Please advise.
>
> Thanks,
> Julie
>
>
> Julie Leonard
> Computational Biologist
> Global Bioinformatics
> Syngenta Biotechnology, Inc.
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:14}}
> Date: Thu, 23 Jan 2014 17:27:46 -0500
> From: <julie.leonard at="" syngenta.com="">
> To: <bioconductor at="" r-project.org="">
> Subject: [BioC] 2 issues about enriched gene sets via Roast
...
> On the other hand, when I ran roast for the contrast with few genes
> differentially expressed, I got few gene sets enriched. But what's
odd
> is it did find some gene sets enriched with FDR.Mixed < 0.05, but
none
> of the genes in the gene set were differentially expressed. Are
these
> enriched gene sets false positives? I'm not sure what's going on
here.
This is in a way the whole point of gene set testing, to be able to
pool
information between genes to find an overal trend that is not apparent
merely by looking at the individual genes.
The original GSEA publication (Mootha et al, Nature Genetics 2003)
began
by saying that no genes were individually significant in their data:
"When assessed with ... analytical techniques that take into account
the
multiple comparisons implicit in microarray analysis, no single gene
had a
significant difference in expression between the diagnostic categories
(data not shown)."
They then went on to show that GSEA nevertheless found significant
gene
set changes. So the phenomenon that you have observed was the
original
justification for the whole gene set testing approach.
Best wishes
Gordon
...
> Please advise.
>
> Thanks,
> Julie
>
>
> Julie Leonard
> Computational Biologist
> Global Bioinformatics
> Syngenta Biotechnology, Inc.
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}