If you want to do an ANOVA-style test for differences across all three
groups, I highly recommend that you look into learning how to use
edgeR
in its GLM mode. The edgeR User's Guide has all the information you
need for this.
If you are doing ANOVA using another software, then it is not doing
the
same tests as edgeR, so you should expect different results. edgeR is
designed more specifically to operate on the kind of data produced by
RNA-seq. It makes certain assumptions based on the fact that you are
dealing with discrete count data rather than a continuous measure of
expression. A full discussion of the differences is too long to
include
here. The edgeR User's Guide has some information, and the edgeR
publications have a more complete discussion.
-Ryan
On Thu 28 Feb 2013 12:04:26 PM PST, Vittoria Roncalli wrote:
> Hi Ryan,
>
> I am using R version 2.15.1 on Mac.
>
> Actually I have run the ANOVA using another software, because I am
not
> familiar with the GLM mode.
> The codes I used to get the exact test and the DGE are the following
>
>
> "# Estimate common dispersion
>
> d <- estimateCommonDisp(d)
>
>
> pseudo_counts<- data.frame(rownames(d$pseudo.alt),d$pseudo.alt)
>
> names(pseudo_counts) =
c("A1","A2","A3","B1","B2","B3","C1","C2","C3")
>
>
write.table(pseudo_counts,"pseudo_counts.txt",sep="\t",row=F,quote=F)
>
>
> RPD2_vs_UPD2_de.com<- exactTest(d, pair=c("A","B"))
>
>
> # Check to see if any NA values are present in results
>
> RPD2_vs_UPD2_de.com<- exactTest(d, pair=c("A","B"))
>
> dim(subsetRPD2_vs_UPD2_de.com$table,is.na
> <http: is.na="">RPD2_vs_UPD2_de.com$table$p.value)))
>
>
> # Top Tags
>
> RPD2_vs_UPD2_results<- topTagsRPD2_vs_UPD2_de.com, n=26788)
>
>
> RPD2_vs_UPD2_results.output<-
> merge(RPD2_vs_UPD2_results$table,d$pseudo.counts,by.x=0,by.y=0)
>
> names(RPD2_vs_UPD2_results.output) <-
> c("ID","logConc","logFC","p.val","adj.P.val","H1L","H2L","H8L","H9L"
,"C4L","C5L","C8L","C9L")
>
>
> sum(p.adjustRPD2_vs_UPD2_de.com$table$PValue, method="BH") < 0.05)
>
>
> #1038
>
>
> top.com <http: top.com=""><- topTagsRPD2_vs_UPD2_de.com, n=1038)
>
>
> sumtop.com <http: top.com="">$table$logFC> 0)
>
> # 638
>
>
> sumtop.com <http: top.com="">$table$logFC< 0)
>
> # 400 "
>
>
> Any ideas on how I can get the list of DGE genes even if I can't
find
> significance with ANOVA?
>
>
>
>
>
>
> On Thu, Feb 28, 2013 at 9:44 AM, Ryan C. Thompson
> <rct at="" thompsonclan.org="" <mailto:rct="" at="" thompsonclan.org="">> wrote:
>
> Oh, I just realized you are using the non-GLM-based mode of
> operation for edgeR. I am much more familiar with the GLM
> workflow, and I believe that the GLM-based workflow is now
> preferred over the exactTest-based one. In fact, I'm not even
sure
> how to do an ANOVA-style comparision of 3 or more groups using
> exactTest.
>
> In any case, the best way to describe what you are tyring to do
is
> to is to show the code you are using. The answers could depend
on
> what options you are using, how you are calculating dispersions,
> and many other small factors. Also please tell us which versions
> of R, and edgeR you are using.
>
>
> On Thu 28 Feb 2013 11:38:04 AM PST, Ryan C. Thompson wrote:
>
> Hi Vittoria,
>
> It would be best if you could show code examples of what
gave
> you an
> empty list and what gave you a list of differentially
> expressed genes
> and what code didn't. Whether you you are doing a pairwise
> comparison
> or a multi-way "ANOVA-style" comparison, edgeR is actually
> performing
> the same test. In general, if all three pairwise comparisons
are
> yielding significant hits, I would expect some significant
> hits in the
> three-way comparison as well.
>
> -Ryan
>
> On Thu 28 Feb 2013 11:26:17 AM PST, Vittoria Roncalli wrote:
>
> Hi Ryan,
>
> Thanks again for your explanation, you saved my day!
> Considering your expertise, I would ask you another
question.
> I run on the raw data counts a simple one way anova (I
have 3
> treatments with 3 reps each) and I found out that there
is no
> significant difference between them. Then, with EdgeR I
> was able, to
> extract a list of DGE fro each pairwise comparison. Is
> this because
> the ANOVA is calculated on the overall library (total #
> genes) while
> the DGE comes from a t-test for each individual gene? I
> found this
> explanation on Bullard et al 2010, but I am not sure if
I have
> misunderstood something.
>
> Does it make sense to you?
>
> Have a good day,and thanks again for your help.
>
> Vittoria
>
> On Wed, Feb 27, 2013 at 9:48 PM, Ryan C. Thompson
> <rct at="" thompsonclan.org="" <mailto:rct="" at="" thompsonclan.org="">
> <mailto:rct at="" thompsonclan.org=""> <mailto:rct at="" thompsonclan.org="">>> wrote:
>
> Hi Vittoria,
>
> Please use "Reply All" so that your reply also goes
to
> the mailing
> list.
>
> The normalization factors are used to adjust the
> library sizes (I
> forget the details, I believe they are given in the
> User's Guide),
> and then the pseudo counts are obtained by
normalizing
> the counts
> to the adjusted library sizes. Since you have not
used any
> normalization factors (i.e. all norm factors = 1),
the
> pseudo
> counts will simply be some constant factor of
> counts-per-million,
> if I'm not mistaken. If you want absolutely no
> normalization, you
> would have to set both the normalization factors and
> library sizes
> to 1, I think.
>
> In any case, the pseudo counts are only for
> descriptive purposes.
> The statistical testing in edgeR happens using the
raw
> integer
> counts.
>
>
> On 02/27/2013 10:12 PM, Vittoria Roncalli wrote:
>
> Hi Ryan,
>
> thanks for your reply.
> I obtain pesudo.counts with the following
commands
>
> "
>
> > raw.data <- read.table("counts
> 2.txt",sep="\t",header=T)
>
> > d <- raw.data[, 2:10]
>
> > d[is.na <http: is.na=""> <http: is.na="">(d)] <- 0
>
> > rownames(d) <- raw.data[, 1]
>
> > group <-
>
c("CONTROL","CONTROL","__CONTROL","LD","LD","LD","HD","__HD","HD")
>
> > d <- DGEList(counts = d, group = group)
>
> Calculating library sizes from column totals.
>
> > keep <- rowSums (cpm(d)>1) >=3
>
> > d <- d[keep,]
>
> > dim(d)
>
> [1] 28755 9
>
> > d <- DGEList(counts = d, group = group)
>
> Calculating library sizes from column totals.
>
> > d <- estimateCommonDisp(d)
>
>
> After the common dispersion, I get in the DGE
list
>
> $counts
>
> $samples
>
> $commondispersion
>
> $pseudo.counts
>
> $logCPM
>
> $pseudo.lib.size
>
>
>
> Then I write a table for the pseudo.counts and I
> will continue
> with those for the DGE.
>
> Considering that I did non normalize the
> libraries, what are the
> different counts in the pseudo.counts output?
>
>
> Thanks so much
>
>
> Vittoria
> On Wed, Feb 27, 2013 at 7:20 PM, Ryan C.
Thompson
> <rct at="" thompsonclan.org=""> <mailto:rct at="" thompsonclan.org="">
> <mailto:rct at="" thompsonclan.org=""> <mailto:rct at="" thompsonclan.org="">>> wrote:
>
> To answer your first question, when you
first
> create a
> DGEList object, all the normalization
factors
> are initially
> set to 1 by default. This is equivalent to
no
> normalization.
> Once you use calcNormFactors, the
> normalization factors will
> be set appropriately.
>
> I'm not sure about the second question.
Could
> you provide an
> example of how you are obtaining
pseudocounts
> with edgeR?
>
>
> On Wed 27 Feb 2013 05:12:27 PM PST, Vittoria
> Roncalli wrote:
>
> Hi, I am a edgeR user and I am a little
> bit confused on
> the normalization
> topic.
> I am using EdgeR to get different
> expressed genes within
> 3 conditions
> (RnaSeq) with 3 replicates each.
> I am following the user guide step:
>
> -update counts file (from mapping
against
> reference
> transcriptome)
> - filter the low counts reads (1cpm)
> - reassess library size
> - estimate common dispersion
>
> Mi first question is related to the
> normalization. Why,
> after I import my
> file, next to the library size there is
> then column with
> norm.factors?
>
> $samples
>
> group lib.size
norm.factors
>
> X48h_C_r1.sam CONTROL 10898526
1
>
> X48h_C_r2.sam CONTROL 7176817
1
>
> X48h_C_r3.sam CONTROL 9511875
1
>
> X48h_LD_r1.sam LD 11350347
1
>
> X48h_LD_r2.sam LD 14836541
1
>
> X48h_LD_r3.sam LD 12635344
1
>
> X48h_HD_r1.sam HD 11840963
1
>
> X48h_HD_r2.sam HD 17335549
1
>
> X48h_HD_r3.sam HD 10274526
1
>
>
>
> Is the normalization automated? What is
> the difference
> with the
> "calNormFactors?"
>
> Moreover, if I do not run the
> calNormFactors, what is
> into the
> pseudo.counts output?
>
>
> I am very confused about those points.
>
>
> Thanks in advance for your help.
>
>
> Looking forward to hearing from you.
>
>
> Vittoria
>
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> <mailto:bioconductor at="" r-project.org="">
> <mailto:bioconductor at="" r-__project.org=""> <mailto:bioconductor at="" r-project.org="">>
>
https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives:
>
>
http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">
>
>
>
>
> --
>
> Vittoria Roncalli
>
> Graduate Research Assistant
> Center B?k?sy Laboratory of Neurobiology
> Pacific Biosciences Research Center
> University of Hawaii at Manoa
> 1993 East-West Road
> Honolulu, HI 96822 USA
>
> Tel: 808-4695693 <tel:808-4695693>
> <tel:808-4695693 <tel:808-4695693="">>
>
>
>
>
>
> --
>
> Vittoria Roncalli
>
> Graduate Research Assistant
> Center B?k?sy Laboratory of Neurobiology
> Pacific Biosciences Research Center
> University of Hawaii at Manoa
> 1993 East-West Road
> Honolulu, HI 96822 USA
>
> Tel: 808-4695693 <tel:808-4695693>
>
>
>
>
> --
>
> Vittoria Roncalli
>
> Graduate Research Assistant
> Center B?k?sy Laboratory of Neurobiology
> Pacific Biosciences Research Center
> University of Hawaii at Manoa
> 1993 East-West Road
> Honolulu, HI 96822 USA
>
> Tel: 808-4695693
>