Hi Gondon,Bioconductor experts,
I have RNA-seq data for 3 genotypes(each genotype having 3 biological
replicates) and 30,000 genes.
I was trying to find out differentially expressed genes across all
genotypes. Here is the code I am using.
library(edgeR)
y<-read.table('test_12339.txt'
)
d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes)
d <- calcNormFactors(d)
times <- rep(c("p1","p2","f1"),c(3,3,3))
times <- factor(times,levels=c("p1","p2","f1"))
design <- model.matrix(~factor(times))
disp <- d2$tagwise.dispersion
fit <- glmFit(d,design,dispersion=disp)
lrt <- glmLRT(d,fit)
topTags(lrt)
Does this code provide me with tags differentially expressed across
all
conditions? (I am not looking for pairwise differential expression
but across all conditions)
Thanks,
Shreyartha
[[alternative HTML version deleted]]
Hi,
On Tue, Apr 5, 2011 at 5:04 PM, Shreyartha Mukherjee
<shreyartha at="" gmail.com=""> wrote:
> Hi Gondon,Bioconductor experts,
>
> I have RNA-seq data for 3 genotypes(each genotype having 3
biological
> replicates) and 30,000 genes.
> I was trying to find out differentially expressed genes across all
> genotypes. Here is the code I am using.
>
> ?library(edgeR)
> y<-read.table('test_12339.txt'
> )
> ?d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes)
> ?d <- calcNormFactors(d)
> ?times <- rep(c("p1","p2","f1"),c(3,3,3))
> ?times <- factor(times,levels=c("p1","p2","f1"))
> ?design <- model.matrix(~factor(times))
> disp <- d2$tagwise.dispersion
> ?fit <- glmFit(d,design,dispersion=disp)
> ?lrt <- glmLRT(d,fit)
> topTags(lrt)
>
> Does this code provide me with tags differentially expressed across
all
> conditions? (I am not looking for pairwise differential expression
> but across all conditions)
Out of curiosity, what does it mean for a gene to be differentially
expressed in all (three) conditions?
I mean -- what would its expression pattern look like across your
three genotypes?
If G1, G2, and G3 are your three different genotypes, are you looking
for a gene A that's differentially expressed when you compare G1 vs.
G2 AND G2 vs G3 AND G1 vs G3 (or something(?))
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Hi,
I want to test the effect of having three different genotypes as
opposed to
having no different genotypes.
I hope this clarifies my question.
Thanks,
Shreyartha
On Tue, Apr 5, 2011 at 4:52 PM, Steve Lianoglou <
mailinglist.honeypot@gmail.com> wrote:
> Hi,
>
> On Tue, Apr 5, 2011 at 5:04 PM, Shreyartha Mukherjee
> <shreyartha@gmail.com> wrote:
> > Hi Gondon,Bioconductor experts,
> >
> > I have RNA-seq data for 3 genotypes(each genotype having 3
biological
> > replicates) and 30,000 genes.
> > I was trying to find out differentially expressed genes across all
> > genotypes. Here is the code I am using.
> >
> > library(edgeR)
> > y<-read.table('test_12339.txt'
> > )
> > d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes)
> > d <- calcNormFactors(d)
> > times <- rep(c("p1","p2","f1"),c(3,3,3))
> > times <- factor(times,levels=c("p1","p2","f1"))
> > design <- model.matrix(~factor(times))
> > disp <- d2$tagwise.dispersion
> > fit <- glmFit(d,design,dispersion=disp)
> > lrt <- glmLRT(d,fit)
> > topTags(lrt)
> >
> > Does this code provide me with tags differentially expressed
across all
> > conditions? (I am not looking for pairwise differential expression
> > but across all conditions)
>
> Out of curiosity, what does it mean for a gene to be differentially
> expressed in all (three) conditions?
>
> I mean -- what would its expression pattern look like across your
> three genotypes?
>
> If G1, G2, and G3 are your three different genotypes, are you
looking
> for a gene A that's differentially expressed when you compare G1 vs.
> G2 AND G2 vs G3 AND G1 vs G3 (or something(?))
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
[[alternative HTML version deleted]]
Dear Shreyartha,
You code is actually testing for genes that are DE between f1 and p1.
By
default, glmLRT tests for the last coefficient in the linear model and
in
your model this happens to represent f1-p1. See help("glmLRT").
What you probably want is
lrt <- glmLRT(d,fit,coef=c(2,3))
This will test whether f1=p1 and p2=p1 simultaneously, i.e., whether
all
three genotypes have the same expression. This will find genes that
are
differentially expressed between any of the genotypes. This is
analogous
to doing genewise oneway ANOVA tests for differences between the three
genotypes.
Note that the above test does not guarantee that all three genotypes
are
different. There is actually no statistical test that can do that.
Rather it detects genes for which the genotypes are not all equal.
Unfortunately, the current release version of topTags() won't work
properly when you test for several coefficients as above. We'll fix
this for the next Bioconductor release in a couple of weeks time. In
the
meantime, you could use the following code:
o <- order(lrt$table$p.value)
ntags <- 10
lrt$table[o[1:ntags],]
to see the most significant tags.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
NHMRC Senior Research Fellow,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
smyth at wehi.edu.au
http://www.wehi.edu.auhttp://www.statsci.org/smyth
On Tue, 5 Apr 2011, Shreyartha Mukherjee wrote:
> Hi Gondon,Bioconductor experts,
>
> I have RNA-seq data for 3 genotypes(each genotype having 3
biological
> replicates) and 30,000 genes. I was trying to find out
differentially
> expressed genes across all genotypes. Here is the code I am using.
>
> library(edgeR)
> y<-read.table('test_12339.txt'
> )
> d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes)
> d <- calcNormFactors(d)
> times <- rep(c("p1","p2","f1"),c(3,3,3))
> times <- factor(times,levels=c("p1","p2","f1"))
> design <- model.matrix(~factor(times))
> disp <- d2$tagwise.dispersion
> fit <- glmFit(d,design,dispersion=disp)
> lrt <- glmLRT(d,fit)
> topTags(lrt)
>
> Does this code provide me with tags differentially expressed across
all
> conditions? (I am not looking for pairwise differential expression
> but across all conditions)
>
> Thanks,
> Shreyartha
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Hi Gordon,
Thanks for your help. I had another question about the lrt table
(obtained
from glmlrt).
How do we interpret the logConc values for 3 genotypes, I understand
for a
pairwise comparison it should be (logA + logB)/2.
Is it (logA + logB+ logC)/3 ?
Thanks,
Shreyartha
On Tue, Apr 5, 2011 at 9:11 PM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
> Dear Shreyartha,
>
> You code is actually testing for genes that are DE between f1 and
p1. By
> default, glmLRT tests for the last coefficient in the linear model
and in
> your model this happens to represent f1-p1. See help("glmLRT").
>
> What you probably want is
>
> lrt <- glmLRT(d,fit,coef=c(2,3))
>
> This will test whether f1=p1 and p2=p1 simultaneously, i.e., whether
all
> three genotypes have the same expression. This will find genes that
are
> differentially expressed between any of the genotypes. This is
analogous to
> doing genewise oneway ANOVA tests for differences between the three
> genotypes.
>
> Note that the above test does not guarantee that all three genotypes
are
> different. There is actually no statistical test that can do that.
Rather
> it detects genes for which the genotypes are not all equal.
>
> Unfortunately, the current release version of topTags() won't work
properly
> when you test for several coefficients as above. We'll fix this for
the
> next Bioconductor release in a couple of weeks time. In the
meantime, you
> could use the following code:
>
> o <- order(lrt$table$p.value)
> ntags <- 10
> lrt$table[o[1:ntags],]
>
> to see the most significant tags.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> NHMRC Senior Research Fellow,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Tel: (03) 9345 2326, Fax (03) 9347 0852,
> smyth@wehi.edu.au
> http://www.wehi.edu.au
> http://www.statsci.org/smyth
>
>
> On Tue, 5 Apr 2011, Shreyartha Mukherjee wrote:
>
> Hi Gondon,Bioconductor experts,
>>
>> I have RNA-seq data for 3 genotypes(each genotype having 3
biological
>> replicates) and 30,000 genes. I was trying to find out
differentially
>> expressed genes across all genotypes. Here is the code I am using.
>>
>> library(edgeR)
>> y<-read.table('test_12339.txt'
>> )
>> d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes)
>> d <- calcNormFactors(d)
>> times <- rep(c("p1","p2","f1"),c(3,3,3))
>> times <- factor(times,levels=c("p1","p2","f1"))
>> design <- model.matrix(~factor(times))
>> disp <- d2$tagwise.dispersion
>> fit <- glmFit(d,design,dispersion=disp)
>> lrt <- glmLRT(d,fit)
>> topTags(lrt)
>>
>> Does this code provide me with tags differentially expressed across
all
>> conditions? (I am not looking for pairwise differential expression
>> but across all conditions)
>>
>> Thanks,
>> Shreyartha
>>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:10}}
Dear Shreyartha,
logConc is a summary measure of average concentration for each tag
over
all treatment conditions. It is the same value regardless of what
comparison is being made. In the version of edgeR about to be
released,
it is computed by mglmOneGroup() and is close to
log(mean(count/lib.size)).
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
NHMRC Senior Research Fellow,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
smyth at wehi.edu.au
http://www.wehi.edu.auhttp://www.statsci.org/smyth
On Wed, 6 Apr 2011, Shreyartha Mukherjee wrote:
> Hi Gordon,
>
> Thanks for your help. I had another question about the lrt table
(obtained
> from glmlrt).
>
> How do we interpret the logConc values for 3 genotypes, I understand
for a
> pairwise comparison it should be (logA + logB)/2.
> Is it (logA + logB+ logC)/3 ?
>
> Thanks,
> Shreyartha
>
>
>
>
>
> On Tue, Apr 5, 2011 at 9:11 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>
>> Dear Shreyartha,
>>
>> You code is actually testing for genes that are DE between f1 and
p1. By
>> default, glmLRT tests for the last coefficient in the linear model
and in
>> your model this happens to represent f1-p1. See help("glmLRT").
>>
>> What you probably want is
>>
>> lrt <- glmLRT(d,fit,coef=c(2,3))
>>
>> This will test whether f1=p1 and p2=p1 simultaneously, i.e.,
whether all
>> three genotypes have the same expression. This will find genes
that are
>> differentially expressed between any of the genotypes. This is
analogous to
>> doing genewise oneway ANOVA tests for differences between the three
>> genotypes.
>>
>> Note that the above test does not guarantee that all three
genotypes are
>> different. There is actually no statistical test that can do that.
Rather
>> it detects genes for which the genotypes are not all equal.
>>
>> Unfortunately, the current release version of topTags() won't work
properly
>> when you test for several coefficients as above. We'll fix this
for the
>> next Bioconductor release in a couple of weeks time. In the
meantime, you
>> could use the following code:
>>
>> o <- order(lrt$table$p.value)
>> ntags <- 10
>> lrt$table[o[1:ntags],]
>>
>> to see the most significant tags.
>>
>> Best wishes
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> NHMRC Senior Research Fellow,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> Tel: (03) 9345 2326, Fax (03) 9347 0852,
>> smyth at wehi.edu.au
>> http://www.wehi.edu.au
>> http://www.statsci.org/smyth
>>
>>
>> On Tue, 5 Apr 2011, Shreyartha Mukherjee wrote:
>>
>> Hi Gondon,Bioconductor experts,
>>>
>>> I have RNA-seq data for 3 genotypes(each genotype having 3
biological
>>> replicates) and 30,000 genes. I was trying to find out
differentially
>>> expressed genes across all genotypes. Here is the code I am using.
>>>
>>> library(edgeR)
>>> y<-read.table('test_12339.txt'
>>> )
>>> d <- DGEList(y, group = rep(1:3, each = 3), lib.size = lib.sizes)
>>> d <- calcNormFactors(d)
>>> times <- rep(c("p1","p2","f1"),c(3,3,3))
>>> times <- factor(times,levels=c("p1","p2","f1"))
>>> design <- model.matrix(~factor(times))
>>> disp <- d2$tagwise.dispersion
>>> fit <- glmFit(d,design,dispersion=disp)
>>> lrt <- glmLRT(d,fit)
>>> topTags(lrt)
>>>
>>> Does this code provide me with tags differentially expressed
across all
>>> conditions? (I am not looking for pairwise differential expression
>>> but across all conditions)
>>>
>>> Thanks,
>>> Shreyartha
>>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>>
______________________________________________________________________
>>
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}