Hi Sridhara.
In future when asking questions on the mailing list, please Reply-All
instead of just replying to the responder (me, in this case), so that
others can read the entire thread if they want to.
I've inserted some comments below. See "--->".
> Hello Mark,
> Thank you very much for your reply. I have given an example to as my
questions.
>
> I used Bowtie for mapping the RNA-seq data (raw files) to a
reference
sequence, and the used the outfile to calculate the abundance of the
tags
> that match to the unique gene ID (of the reference sequence).
> The outfile of PhyP18B1.txt looks something like this:
> PITG_01618 | Pi conserved hypothetical protein (773 nt) 73
> PITG_22358 | Pi conserved hypothetical protein (1483 nt) 20
> PITG_05672 | Pi conserved hypothetical protein (1898 nt) 513
> PITG_10862 | Pi cytochrome c oxidase assembly protein COX11 (735 nt)
1346
>
>> setwd("C:/Users/SRIDHARA/Documents/test/bowtie/0_mismatch")
>> targets <- read.delim(file = "targets.txt", stringsAsFactors =
FALSE) d
<- readDGE(targets, skip = 5, comment.char = "!")
>> d <- d[rowSums(d$counts) > 5, ]
>> d <- calcNormFactors(d)
>> d$samples
> files group description lib.size
norm.factors
> 1 PhyP18B1.txt PhyP18 Phytophthora phaseoli 2442435
0.7528028 2
PhyP18B2.txt PhyP18 Phytophthora phaseoli 7355562 1.0412687 3
PPLB6dpiB1.txt PPLB6dpi Phytophthora phaseoli 3280812 1.1965512 4
PPLB6dpiB2.txt PPLB6dpi Phytophthora phaseoli 3906611 1.0661656
>
> 1) My objective is to study the expression levels of group of genes
(for
eg., effector genes) in files PhyP18 (one group) and PPLB6dpi (another
group).
> So, I was wondering if I could multiply each of the tag counts (for
each
geneID) in all the files with their respective norm.factor values. For
example in PhyP18B1 multiplying each geneID tag counts with
0.7528028
> (norm.factors)
> PITG_01618 | Pi conserved hypothetical protein (773 nt) 73 *
0.7528028
> =
> 54.9546044
---> No, do NOT multiply by the normalization factor. Did you notice
the
formula in my last email? You'll see there you divide your counts by
the
product of the library size and normalization factor and then
(optionally?) multiple by a constant to put the normalized values on a
scale of interest.
>
> and similarly each tag counts in file PPLB6dpiB1 with 1.1965512 and
so
on...
>
> later can I use the value 54.954 as the normalized tag count for
that
gene
> ID in that library (PhyP18B1)?
>
> 2) Second question is about the plotSmear().
> red dots (in plot) are the gene ID
> orange dots (in plot) are the genes that are unique to one of
the
> library
> black dots - this where I had trouble in understand. Are these
set
of
> prominent genes that are largely attributable for the overall bias
in
log-fold-changes
>
---> Again, I'm taking a stab because you haven't told us what
commands
you are using, but if you are using something like:
[...]
plotSmear(d, de.tags=de.tags)
... then the red, orange, black dots are the DE, unique-to-sample and
remaining genes, respectively. See also the help docs at ?plotSmear
and
?maPlot.
Hope that helps.
Mark
>
> Sorry for such a long email. Any suggestions will be highly
appreciated.
Thanks for taking a stab on this.
>
> Sridhara
>
>
>
> On Mon, Jan 24, 2011 at 10:38 PM, Mark Robinson
> <mrobinson at="" wehi.edu.au="">wrote:
>
>> Hi Sridhara.
>> I'm having trouble understanding what you have already done
(perhaps show
>> some code?) and what exactly are asking, but I'll take a stab.
>> When you say "the DGE plot", what are you referring to? A plot
produced
>> by
>> plotSmear()? If so, then yes, the log-ratios (and 'logConc') will
take
into
>> account the normalization factors. But, this still doesn't lead to
"normalized counts". For this, you could do something like:
>> d <- DGEList(counts=...)
>> d <- calcNormFactors(d)
>> rpm <- t(t(d$counts) / (d$samples$lib.size*d$samples$norm.factors))
* 1e6
>> Here, 'rpm' means reads per million, having used what you might
call
'effective' library sizes i.e. adjusted for relative composition
differences. Of course, if you are doing RNA-seq, you may want to
include a
>> term in there for gene length, which will make it more like an
RPKM.
It
>> sort of depends what your objective is here, because none of this
is
really
>> necessary for differential expression analysis (as you will see
from edgeR
>> user's guide). Note that the 'rpm' table is not to be used for any
statistical analysis of differential expression, since the data has
been
>> put
>> on an arbitrary scale.
>> Hope that helps.
>> Cheers,
>> Mark
>> On 2011-01-25, at 11:40 AM, Sridhara Gupta Kunjeti wrote:
>> > Hi,
>> >> I am a PhD candidate at University of Delaware and area of
research
>> is
>> >> RNA-seq and functional genomics in Phytophthora phaseoli. I was
>> impressed by
>> >> your paper in Genome Biology on "A scaling normalization method
for
differential expression analysis of RNA-seq data". I did use the
>> edgeR
>> guide
>> >> and user manual and it makes lot of sense when the DGE in a plot
>> graph.
>> >> If I am correct the DGE plot (Visualising DGE results) is
plotted
>> using
>> the
>> >> normalized tag counts for each gene ID, please correct me if I
am
>> wrong?
>> >> If so, I was wondering if I can generate a list or table or gene
ID
>> with
>> >> the normalized tag counts? I would appreciate if you let me know
if
>> this
>> can
>> >> be done.
>> >>
>> >> Thanks,
>> >> Sridhara
>> >>
>> >>
>> >> --
>> >> Sridhara G Kunjeti
>> >> PhD Candidate
>> >> University of Delaware
>> >> Department of Plant and Soil Science
>> >> email- sridhara at udel.edu
>> >> Ph: 832-566-0011
>> >>
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: mrobinson at wehi.edu.au
>> e: m.robinson at garvan.org.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:27}}