Hi Jens,
I don't know what you mean by single nucleotide based normalization,
however the following comments may be helpful.
edgeR automatically adjusts for library sizes, whether you include an
explicit normalization step or not. Normalization is a separate
issue,
and is intended to deal with more subtle issues.
Normalization, as edgeR does it, does not require replicates.
Best wishes
Gordon
> Date: Fri, 04 Feb 2011 11:28:15 +0100
> From: Jens Georg <jens.georg at="" biologie.uni-freiburg.de="">
> To: bioconductor at r-project.org
> Subject: [BioC] Single nucleotide based RNAseq normalization with
> edgeR?
> Message-ID: <4D4BD4BF.4010009 at biologie.uni-freiburg.de>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
>
>
> Dear edgeR users and developers,
>
> we used Solexa sequencing in order to detect RNase E processing
sites.
> Therefor we splitted a RNA sample and treated one half with RNase E
> prior to cDNA synthesis and sequencing. The libraries differ in size
> (1.918.953 and 1.208.586 reads respectively) which clearly
necessitates
> a normalization step. Furthermore we expect site specific
differences
> rather than differences in the accumulation of the full length RNAs.
>
> So I want to ask, if it is appropiate to do a single nucleotide
based
> normalization with edgeR and do you think a reliable basic
normalization
> is possible without replicates?
>
> Thank you for your comments.
>
> Best regards
>
> Jens
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Hi Gordon,
thank you for your reply. The resolution of our ~100nt solexa reads is
to small to detect individual processing sites, so we want to
investigate every single nucleotide individually ("single nucleotide
based normalization"). That means that we count, how often an
individual
nucleotide is covered by sequence reads. Of course, this approach will
virtually increase the lib.size by a factor which depends on length of
the solexa reads. As the lib.size is critical for the normalization, I
am not sure if I should use the original read numbers for each library
or the read numbers multiplicated with the read length to adjust for
the
single nucleotide investigation.
I have two more question regarding to the normalization:
1. Are the norm factors calculated by the calcNormFactors( ) function
automatically used for further steps like the estimateCommonDisp( )
function?
2. Are the pseudocounts calculated by estimateCommonDisp( ) the
normalized readcounts?
Many thanks
Jens
> Hi Jens,
>
> I don't know what you mean by single nucleotide based normalization,
> however the following comments may be helpful.
>
> edgeR automatically adjusts for library sizes, whether you include
an
> explicit normalization step or not. Normalization is a separate
> issue, and is intended to deal with more subtle issues.
>
> Normalization, as edgeR does it, does not require replicates.
>
> Best wishes
> Gordon
>
>> Date: Fri, 04 Feb 2011 11:28:15 +0100
>> From: Jens Georg <jens.georg at="" biologie.uni-freiburg.de="">
>> To: bioconductor at r-project.org
>> Subject: [BioC] Single nucleotide based RNAseq normalization with
>> edgeR?
>> Message-ID: <4D4BD4BF.4010009 at biologie.uni-freiburg.de>
>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>>
>>
>>
>> Dear edgeR users and developers,
>>
>> we used Solexa sequencing in order to detect RNase E processing
sites.
>> Therefor we splitted a RNA sample and treated one half with RNase E
>> prior to cDNA synthesis and sequencing. The libraries differ in
size
>> (1.918.953 and 1.208.586 reads respectively) which clearly
necessitates
>> a normalization step. Furthermore we expect site specific
differences
>> rather than differences in the accumulation of the full length
RNAs.
>>
>> So I want to ask, if it is appropiate to do a single nucleotide
based
>> normalization with edgeR and do you think a reliable basic
normalization
>> is possible without replicates?
>>
>> Thank you for your comments.
>>
>> Best regards
>>
>> Jens
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:6}}
Hi Gordon,
First I would like to thank Jens for asking the questions that I had
asked
few days ago.
In additions to the Jens question, I have one more question on my RNA-
seq
data
1. I would like to know if I can multiply the counts for each gene
with the
norm.factor (calculated by "calcNormFactors( )" function)
Thanks
Sridhara
On Mon, Feb 7, 2011 at 5:46 AM, Jens Georg <
jens.georg@biologie.uni-freiburg.de> wrote:
> Hi Gordon,
> thank you for your reply. The resolution of our ~100nt solexa reads
is to
> small to detect individual processing sites, so we want to
investigate every
> single nucleotide individually ("single nucleotide based
normalization").
> That means that we count, how often an individual nucleotide is
covered by
> sequence reads. Of course, this approach will virtually increase the
> lib.size by a factor which depends on length of the solexa reads. As
the
> lib.size is critical for the normalization, I am not sure if I
should use
> the original read numbers for each library or the read numbers
multiplicated
> with the read length to adjust for the single nucleotide
investigation.
>
> I have two more question regarding to the normalization:
> 1. Are the norm factors calculated by the calcNormFactors( )
function
> automatically used for further steps like the estimateCommonDisp( )
> function?
> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
normalized
> readcounts?
>
> Many thanks
>
> Jens
>
> Hi Jens,
>>
>> I don't know what you mean by single nucleotide based
normalization,
>> however the following comments may be helpful.
>>
>> edgeR automatically adjusts for library sizes, whether you include
an
>> explicit normalization step or not. Normalization is a separate
issue, and
>> is intended to deal with more subtle issues.
>>
>> Normalization, as edgeR does it, does not require replicates.
>>
>> Best wishes
>> Gordon
>>
>> Date: Fri, 04 Feb 2011 11:28:15 +0100
>>> From: Jens Georg <jens.georg@biologie.uni-freiburg.de>
>>> To: bioconductor@r-project.org
>>> Subject: [BioC] Single nucleotide based RNAseq normalization with
>>> edgeR?
>>> Message-ID: <4D4BD4BF.4010009@biologie.uni-freiburg.de>
>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>>>
>>>
>>>
>>> Dear edgeR users and developers,
>>>
>>> we used Solexa sequencing in order to detect RNase E processing
sites.
>>> Therefor we splitted a RNA sample and treated one half with RNase
E
>>> prior to cDNA synthesis and sequencing. The libraries differ in
size
>>> (1.918.953 and 1.208.586 reads respectively) which clearly
necessitates
>>> a normalization step. Furthermore we expect site specific
differences
>>> rather than differences in the accumulation of the full length
RNAs.
>>>
>>> So I want to ask, if it is appropiate to do a single nucleotide
based
>>> normalization with edgeR and do you think a reliable basic
normalization
>>> is possible without replicates?
>>>
>>> Thank you for your comments.
>>>
>>> Best regards
>>>
>>> Jens
>>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and
inte...{{dropped:6}}
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Sridhara G Kunjeti
PhD Candidate
University of Delaware
Department of Plant and Soil Science
email- sridhara@udel.edu
Ph: 832-566-0011
[[alternative HTML version deleted]]
Hi Jens/Sridhara.
A few thoughts below.
On 2011-02-07, at 11:22 PM, Sridhara Gupta Kunjeti wrote:
> Hi Gordon,
> First I would like to thank Jens for asking the questions that I had
asked
> few days ago.
> In additions to the Jens question, I have one more question on my
RNA-seq
> data
> 1. I would like to know if I can multiply the counts for each gene
with the
> norm.factor (calculated by "calcNormFactors( )" function)
Sridhara, you've asked this exact question before and I answered
(short answer is: NO to multiplying ... instead, divide by [library
size]*[normalization factor]):
https://stat.ethz.ch/pipermail/bioconductor/2011-January/037564.htmlhttps://stat.ethz.ch/pipermail/bioconductor/2011-January/037469.html
Perhaps you can clarify what you don't understand.
> On Mon, Feb 7, 2011 at 5:46 AM, Jens Georg <
> jens.georg at biologie.uni-freiburg.de> wrote:
>
>> Hi Gordon,
>> thank you for your reply. The resolution of our ~100nt solexa reads
is to
>> small to detect individual processing sites, so we want to
investigate every
>> single nucleotide individually ("single nucleotide based
normalization").
>> That means that we count, how often an individual nucleotide is
covered by
>> sequence reads. Of course, this approach will virtually increase
the
>> lib.size by a factor which depends on length of the solexa reads.
As the
>> lib.size is critical for the normalization, I am not sure if I
should use
>> the original read numbers for each library or the read numbers
multiplicated
>> with the read length to adjust for the single nucleotide
investigation.
So basically, by counting this way, your library size is ~100x the
number of reads you've actually mapped. While I think this will work
out ok (normalization calculation be fine), this coverage calculation
does impose a (strong?) dependence between adjacent nucleotides. One
alternative would be to count the reads that *begin* at a given
nucleotide and only consider these. Then your library sizes are as
normal.
>> I have two more question regarding to the normalization:
>> 1. Are the norm factors calculated by the calcNormFactors( )
function
>> automatically used for further steps like the estimateCommonDisp( )
>> function?
Yes.
>> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
normalized
>> readcounts?
Yes, but this is only accounting for overall depth and potential
composition biases, not for length biases (or any others). It is with
the intention of making inferences of a given gene across conditions.
The inferences for differential expression are still done on the raw
counts.
Hope that helps.
Mark
>>
>> Many thanks
>>
>> Jens
>>
>> Hi Jens,
>>>
>>> I don't know what you mean by single nucleotide based
normalization,
>>> however the following comments may be helpful.
>>>
>>> edgeR automatically adjusts for library sizes, whether you include
an
>>> explicit normalization step or not. Normalization is a separate
issue, and
>>> is intended to deal with more subtle issues.
>>>
>>> Normalization, as edgeR does it, does not require replicates.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> Date: Fri, 04 Feb 2011 11:28:15 +0100
>>>> From: Jens Georg <jens.georg at="" biologie.uni-freiburg.de="">
>>>> To: bioconductor at r-project.org
>>>> Subject: [BioC] Single nucleotide based RNAseq normalization with
>>>> edgeR?
>>>> Message-ID: <4D4BD4BF.4010009 at biologie.uni-freiburg.de>
>>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>>>>
>>>>
>>>>
>>>> Dear edgeR users and developers,
>>>>
>>>> we used Solexa sequencing in order to detect RNase E processing
sites.
>>>> Therefor we splitted a RNA sample and treated one half with RNase
E
>>>> prior to cDNA synthesis and sequencing. The libraries differ in
size
>>>> (1.918.953 and 1.208.586 reads respectively) which clearly
necessitates
>>>> a normalization step. Furthermore we expect site specific
differences
>>>> rather than differences in the accumulation of the full length
RNAs.
>>>>
>>>> So I want to ask, if it is appropiate to do a single nucleotide
based
>>>> normalization with edgeR and do you think a reliable basic
normalization
>>>> is possible without replicates?
>>>>
>>>> Thank you for your comments.
>>>>
>>>> Best regards
>>>>
>>>> Jens
>>>>
>>>
>>>
______________________________________________________________________
>>> The information in this email is confidential and
inte...{{dropped:6}}
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara at udel.edu
> Ph: 832-566-0011
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Hello Mark,
This is in continuation with the normalization of the counts:
did you mean
(count / library size) * Norm.factor
Can I use the numbers for the library size and Norm.factor can be used
from
the edgeR?
Thanks,
Sridhara
On Mon, Feb 7, 2011 at 5:11 PM, Mark Robinson <mrobinson@wehi.edu.au>
wrote:
> Hi Jens/Sridhara.
>
> A few thoughts below.
>
> On 2011-02-07, at 11:22 PM, Sridhara Gupta Kunjeti wrote:
>
> > Hi Gordon,
> > First I would like to thank Jens for asking the questions that I
had
> asked
> > few days ago.
> > In additions to the Jens question, I have one more question on my
RNA-seq
> > data
> > 1. I would like to know if I can multiply the counts for each gene
with
> the
> > norm.factor (calculated by "calcNormFactors( )" function)
>
>
> Sridhara, you've asked this exact question before and I answered
(short
> answer is: NO to multiplying ... instead, divide by [library
> size]*[normalization factor]):
>
> https://stat.ethz.ch/pipermail/bioconductor/2011-January/037564.html
> https://stat.ethz.ch/pipermail/bioconductor/2011-January/037469.html
>
> Perhaps you can clarify what you don't understand.
>
>
> > On Mon, Feb 7, 2011 at 5:46 AM, Jens Georg <
> > jens.georg@biologie.uni-freiburg.de> wrote:
> >
> >> Hi Gordon,
> >> thank you for your reply. The resolution of our ~100nt solexa
reads is
> to
> >> small to detect individual processing sites, so we want to
investigate
> every
> >> single nucleotide individually ("single nucleotide based
> normalization").
> >> That means that we count, how often an individual nucleotide is
covered
> by
> >> sequence reads. Of course, this approach will virtually increase
the
> >> lib.size by a factor which depends on length of the solexa reads.
As the
> >> lib.size is critical for the normalization, I am not sure if I
should
> use
> >> the original read numbers for each library or the read numbers
> multiplicated
> >> with the read length to adjust for the single nucleotide
investigation.
>
>
> So basically, by counting this way, your library size is ~100x the
number
> of reads you've actually mapped. While I think this will work out
ok
> (normalization calculation be fine), this coverage calculation does
impose a
> (strong?) dependence between adjacent nucleotides. One alternative
would be
> to count the reads that *begin* at a given nucleotide and only
consider
> these. Then your library sizes are as normal.
>
>
> >> I have two more question regarding to the normalization:
> >> 1. Are the norm factors calculated by the calcNormFactors( )
function
> >> automatically used for further steps like the estimateCommonDisp(
)
> >> function?
>
> Yes.
>
>
> >> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
> normalized
> >> readcounts?
>
> Yes, but this is only accounting for overall depth and potential
> composition biases, not for length biases (or any others). It is
with the
> intention of making inferences of a given gene across conditions.
The
> inferences for differential expression are still done on the raw
counts.
>
> Hope that helps.
> Mark
>
>
>
>
> >>
> >> Many thanks
> >>
> >> Jens
> >>
> >> Hi Jens,
> >>>
> >>> I don't know what you mean by single nucleotide based
normalization,
> >>> however the following comments may be helpful.
> >>>
> >>> edgeR automatically adjusts for library sizes, whether you
include an
> >>> explicit normalization step or not. Normalization is a separate
issue,
> and
> >>> is intended to deal with more subtle issues.
> >>>
> >>> Normalization, as edgeR does it, does not require replicates.
> >>>
> >>> Best wishes
> >>> Gordon
> >>>
> >>> Date: Fri, 04 Feb 2011 11:28:15 +0100
> >>>> From: Jens Georg <jens.georg@biologie.uni-freiburg.de>
> >>>> To: bioconductor@r-project.org
> >>>> Subject: [BioC] Single nucleotide based RNAseq normalization
with
> >>>> edgeR?
> >>>> Message-ID: <4D4BD4BF.4010009@biologie.uni-freiburg.de>
> >>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
> >>>>
> >>>>
> >>>>
> >>>> Dear edgeR users and developers,
> >>>>
> >>>> we used Solexa sequencing in order to detect RNase E processing
sites.
> >>>> Therefor we splitted a RNA sample and treated one half with
RNase E
> >>>> prior to cDNA synthesis and sequencing. The libraries differ in
size
> >>>> (1.918.953 and 1.208.586 reads respectively) which clearly
> necessitates
> >>>> a normalization step. Furthermore we expect site specific
differences
> >>>> rather than differences in the accumulation of the full length
RNAs.
> >>>>
> >>>> So I want to ask, if it is appropiate to do a single nucleotide
based
> >>>> normalization with edgeR and do you think a reliable basic
> normalization
> >>>> is possible without replicates?
> >>>>
> >>>> Thank you for your comments.
> >>>>
> >>>> Best regards
> >>>>
> >>>> Jens
> >>>>
> >>>
> >>>
______________________________________________________________________
> >>> The information in this email is confidential and
inte...{{dropped:6}}
> >>>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >
> >
> > --
> > Sridhara G Kunjeti
> > PhD Candidate
> > University of Delaware
> > Department of Plant and Soil Science
> > email- sridhara@udel.edu
> > Ph: 832-566-0011
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: mrobinson@wehi.edu.au
> e: m.robinson@garvan.org.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:20}}
Hi Sridhara.
On 2011-02-10, at 4:34 AM, Sridhara Gupta Kunjeti wrote:
> Hello Mark,
> This is in continuation with the normalization of the counts:
> did you mean
>
> (count / library size) * Norm.factor
> Can I use the numbers for the library size and Norm.factor can be
used from the edgeR?
No. Actually, I mean what I wrote in both previous posts. I'll
repeat again and hopefully third time lucky:
rpm <- t(t(d$counts) / (d$samples$lib.size*d$samples$norm.factors)) *
1e6
So, this translates to:
count / (lib.size*Norm.factor)
... and you may multiply by a factor to put it on a different scale
(e.g. multiply by 1M as I've done above). And, you should remember
all the previous caveats that I've mentioned (i.e. there is no need to
do this for a differential expression analysis as edgeR already builds
this in + this doesn't account for other biases such as gene length).
Hope that helps.
Mark
> Thanks,
> Sridhara
>
>
> On Mon, Feb 7, 2011 at 5:11 PM, Mark Robinson <mrobinson at="" wehi.edu.au=""> wrote:
> Hi Jens/Sridhara.
>
> A few thoughts below.
>
> On 2011-02-07, at 11:22 PM, Sridhara Gupta Kunjeti wrote:
>
> > Hi Gordon,
> > First I would like to thank Jens for asking the questions that I
had asked
> > few days ago.
> > In additions to the Jens question, I have one more question on my
RNA-seq
> > data
> > 1. I would like to know if I can multiply the counts for each gene
with the
> > norm.factor (calculated by "calcNormFactors( )" function)
>
>
> Sridhara, you've asked this exact question before and I answered
(short answer is: NO to multiplying ... instead, divide by [library
size]*[normalization factor]):
>
> https://stat.ethz.ch/pipermail/bioconductor/2011-January/037564.html
> https://stat.ethz.ch/pipermail/bioconductor/2011-January/037469.html
>
> Perhaps you can clarify what you don't understand.
>
>
> > On Mon, Feb 7, 2011 at 5:46 AM, Jens Georg <
> > jens.georg at biologie.uni-freiburg.de> wrote:
> >
> >> Hi Gordon,
> >> thank you for your reply. The resolution of our ~100nt solexa
reads is to
> >> small to detect individual processing sites, so we want to
investigate every
> >> single nucleotide individually ("single nucleotide based
normalization").
> >> That means that we count, how often an individual nucleotide is
covered by
> >> sequence reads. Of course, this approach will virtually increase
the
> >> lib.size by a factor which depends on length of the solexa reads.
As the
> >> lib.size is critical for the normalization, I am not sure if I
should use
> >> the original read numbers for each library or the read numbers
multiplicated
> >> with the read length to adjust for the single nucleotide
investigation.
>
>
> So basically, by counting this way, your library size is ~100x the
number of reads you've actually mapped. While I think this will work
out ok (normalization calculation be fine), this coverage calculation
does impose a (strong?) dependence between adjacent nucleotides. One
alternative would be to count the reads that *begin* at a given
nucleotide and only consider these. Then your library sizes are as
normal.
>
>
> >> I have two more question regarding to the normalization:
> >> 1. Are the norm factors calculated by the calcNormFactors( )
function
> >> automatically used for further steps like the estimateCommonDisp(
)
> >> function?
>
> Yes.
>
>
> >> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
normalized
> >> readcounts?
>
> Yes, but this is only accounting for overall depth and potential
composition biases, not for length biases (or any others). It is with
the intention of making inferences of a given gene across conditions.
The inferences for differential expression are still done on the raw
counts.
>
> Hope that helps.
> Mark
>
>
>
>
> >>
> >> Many thanks
> >>
> >> Jens
> >>
> >> Hi Jens,
> >>>
> >>> I don't know what you mean by single nucleotide based
normalization,
> >>> however the following comments may be helpful.
> >>>
> >>> edgeR automatically adjusts for library sizes, whether you
include an
> >>> explicit normalization step or not. Normalization is a separate
issue, and
> >>> is intended to deal with more subtle issues.
> >>>
> >>> Normalization, as edgeR does it, does not require replicates.
> >>>
> >>> Best wishes
> >>> Gordon
> >>>
> >>> Date: Fri, 04 Feb 2011 11:28:15 +0100
> >>>> From: Jens Georg <jens.georg at="" biologie.uni-freiburg.de="">
> >>>> To: bioconductor at r-project.org
> >>>> Subject: [BioC] Single nucleotide based RNAseq normalization
with
> >>>> edgeR?
> >>>> Message-ID: <4D4BD4BF.4010009 at biologie.uni-freiburg.de>
> >>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
> >>>>
> >>>>
> >>>>
> >>>> Dear edgeR users and developers,
> >>>>
> >>>> we used Solexa sequencing in order to detect RNase E processing
sites.
> >>>> Therefor we splitted a RNA sample and treated one half with
RNase E
> >>>> prior to cDNA synthesis and sequencing. The libraries differ in
size
> >>>> (1.918.953 and 1.208.586 reads respectively) which clearly
necessitates
> >>>> a normalization step. Furthermore we expect site specific
differences
> >>>> rather than differences in the accumulation of the full length
RNAs.
> >>>>
> >>>> So I want to ask, if it is appropiate to do a single nucleotide
based
> >>>> normalization with edgeR and do you think a reliable basic
normalization
> >>>> is possible without replicates?
> >>>>
> >>>> Thank you for your comments.
> >>>>
> >>>> Best regards
> >>>>
> >>>> Jens
> >>>>
> >>>
> >>>
______________________________________________________________________
> >>> The information in this email is confidential and
inte...{{dropped:6}}
> >>>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >
> >
> > --
> > Sridhara G Kunjeti
> > PhD Candidate
> > University of Delaware
> > Department of Plant and Soil Science
> > email- sridhara at udel.edu
> > Ph: 832-566-0011
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: mrobinson at wehi.edu.au
> e: m.robinson at garvan.org.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
>
______________________________________________________________________
> The information in this email is confidential and intended solely
for the addressee.
> You must not disclose, forward, print or use it without the
permission of the sender.
>
______________________________________________________________________
>
>
>
> --
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara at udel.edu
> Ph: 832-566-0011
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and intended solely for
the addressee.
You must not disclose, forward, print or use it without the permission
of the sender.
Hello Mark,
Yes, Now it is clear to me.
Thank you very much for being patient in responding to my questions.
Many thanks!
Sridhara
On Wed, Feb 9, 2011 at 5:16 PM, Mark Robinson <mrobinson@wehi.edu.au>
wrote:
> Hi Sridhara.
>
> On 2011-02-10, at 4:34 AM, Sridhara Gupta Kunjeti wrote:
>
> > Hello Mark,
> > This is in continuation with the normalization of the counts:
> > did you mean
> >
> > (count / library size) * Norm.factor
> > Can I use the numbers for the library size and Norm.factor can be
used
> from the edgeR?
>
>
> No. Actually, I mean what I wrote in both previous posts. I'll
repeat
> again and hopefully third time lucky:
>
> rpm <- t(t(d$counts) / (d$samples$lib.size*d$samples$norm.factors))
* 1e6
>
> So, this translates to:
>
> count / (lib.size*Norm.factor)
>
> ... and you may multiply by a factor to put it on a different scale
(e.g.
> multiply by 1M as I've done above). And, you should remember all
the
> previous caveats that I've mentioned (i.e. there is no need to do
this for a
> differential expression analysis as edgeR already builds this in +
this
> doesn't account for other biases such as gene length).
>
> Hope that helps.
> Mark
>
>
>
>
> > Thanks,
> > Sridhara
> >
> >
> > On Mon, Feb 7, 2011 at 5:11 PM, Mark Robinson
<mrobinson@wehi.edu.au>
> wrote:
> > Hi Jens/Sridhara.
> >
> > A few thoughts below.
> >
> > On 2011-02-07, at 11:22 PM, Sridhara Gupta Kunjeti wrote:
> >
> > > Hi Gordon,
> > > First I would like to thank Jens for asking the questions that I
had
> asked
> > > few days ago.
> > > In additions to the Jens question, I have one more question on
my
> RNA-seq
> > > data
> > > 1. I would like to know if I can multiply the counts for each
gene with
> the
> > > norm.factor (calculated by "calcNormFactors( )" function)
> >
> >
> > Sridhara, you've asked this exact question before and I answered
(short
> answer is: NO to multiplying ... instead, divide by [library
> size]*[normalization factor]):
> >
> >
https://stat.ethz.ch/pipermail/bioconductor/2011-January/037564.html
> >
https://stat.ethz.ch/pipermail/bioconductor/2011-January/037469.html
> >
> > Perhaps you can clarify what you don't understand.
> >
> >
> > > On Mon, Feb 7, 2011 at 5:46 AM, Jens Georg <
> > > jens.georg@biologie.uni-freiburg.de> wrote:
> > >
> > >> Hi Gordon,
> > >> thank you for your reply. The resolution of our ~100nt solexa
reads is
> to
> > >> small to detect individual processing sites, so we want to
investigate
> every
> > >> single nucleotide individually ("single nucleotide based
> normalization").
> > >> That means that we count, how often an individual nucleotide is
> covered by
> > >> sequence reads. Of course, this approach will virtually
increase the
> > >> lib.size by a factor which depends on length of the solexa
reads. As
> the
> > >> lib.size is critical for the normalization, I am not sure if I
should
> use
> > >> the original read numbers for each library or the read numbers
> multiplicated
> > >> with the read length to adjust for the single nucleotide
> investigation.
> >
> >
> > So basically, by counting this way, your library size is ~100x the
number
> of reads you've actually mapped. While I think this will work out
ok
> (normalization calculation be fine), this coverage calculation does
impose a
> (strong?) dependence between adjacent nucleotides. One alternative
would be
> to count the reads that *begin* at a given nucleotide and only
consider
> these. Then your library sizes are as normal.
> >
> >
> > >> I have two more question regarding to the normalization:
> > >> 1. Are the norm factors calculated by the calcNormFactors( )
function
> > >> automatically used for further steps like the
estimateCommonDisp( )
> > >> function?
> >
> > Yes.
> >
> >
> > >> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
> normalized
> > >> readcounts?
> >
> > Yes, but this is only accounting for overall depth and potential
> composition biases, not for length biases (or any others). It is
with the
> intention of making inferences of a given gene across conditions.
The
> inferences for differential expression are still done on the raw
counts.
> >
> > Hope that helps.
> > Mark
> >
> >
> >
> >
> > >>
> > >> Many thanks
> > >>
> > >> Jens
> > >>
> > >> Hi Jens,
> > >>>
> > >>> I don't know what you mean by single nucleotide based
normalization,
> > >>> however the following comments may be helpful.
> > >>>
> > >>> edgeR automatically adjusts for library sizes, whether you
include an
> > >>> explicit normalization step or not. Normalization is a
separate
> issue, and
> > >>> is intended to deal with more subtle issues.
> > >>>
> > >>> Normalization, as edgeR does it, does not require replicates.
> > >>>
> > >>> Best wishes
> > >>> Gordon
> > >>>
> > >>> Date: Fri, 04 Feb 2011 11:28:15 +0100
> > >>>> From: Jens Georg <jens.georg@biologie.uni-freiburg.de>
> > >>>> To: bioconductor@r-project.org
> > >>>> Subject: [BioC] Single nucleotide based RNAseq normalization
with
> > >>>> edgeR?
> > >>>> Message-ID: <4D4BD4BF.4010009@biologie.uni-freiburg.de>
> > >>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
> > >>>>
> > >>>>
> > >>>>
> > >>>> Dear edgeR users and developers,
> > >>>>
> > >>>> we used Solexa sequencing in order to detect RNase E
processing
> sites.
> > >>>> Therefor we splitted a RNA sample and treated one half with
RNase E
> > >>>> prior to cDNA synthesis and sequencing. The libraries differ
in size
> > >>>> (1.918.953 and 1.208.586 reads respectively) which clearly
> necessitates
> > >>>> a normalization step. Furthermore we expect site specific
> differences
> > >>>> rather than differences in the accumulation of the full
length RNAs.
> > >>>>
> > >>>> So I want to ask, if it is appropiate to do a single
nucleotide
> based
> > >>>> normalization with edgeR and do you think a reliable basic
> normalization
> > >>>> is possible without replicates?
> > >>>>
> > >>>> Thank you for your comments.
> > >>>>
> > >>>> Best regards
> > >>>>
> > >>>> Jens
> > >>>>
> > >>>
> > >>>
>
______________________________________________________________________
> > >>> The information in this email is confidential and
> inte...{{dropped:6}}
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioconductor mailing list
> > >> Bioconductor@r-project.org
> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >> Search the archives:
> > >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >>
> > >
> > >
> > >
> > > --
> > > Sridhara G Kunjeti
> > > PhD Candidate
> > > University of Delaware
> > > Department of Plant and Soil Science
> > > email- sridhara@udel.edu
> > > Ph: 832-566-0011
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > ------------------------------
> > Mark Robinson, PhD (Melb)
> > Epigenetics Laboratory, Garvan
> > Bioinformatics Division, WEHI
> > e: mrobinson@wehi.edu.au
> > e: m.robinson@garvan.org.au
> > p: +61 (0)3 9345 2628
> > f: +61 (0)3 9347 0852
> > ------------------------------
> >
> >
> >
______________________________________________________________________
> > The information in this email is confidential and intended solely
for the
> addressee.
> > You must not disclose, forward, print or use it without the
permission of
> the sender.
> >
______________________________________________________________________
> >
> >
> >
> > --
> > Sridhara G Kunjeti
> > PhD Candidate
> > University of Delaware
> > Department of Plant and Soil Science
> > email- sridhara@udel.edu
> > Ph: 832-566-0011
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: mrobinson@wehi.edu.au
> e: m.robinson@garvan.org.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:20}}
Hello Mark,
If I want to include a term (Gene length) in the below mentioned code
to
make it like RPKM. How to add this term. I would appreciate it.
Many thanks in advance!
Sridhara
On Wed, Feb 9, 2011 at 6:19 PM, Sridhara Gupta Kunjeti
<sridhara@udel.edu>wrote:
> Hello Mark,
> Yes, Now it is clear to me.
> Thank you very much for being patient in responding to my questions.
>
> Many thanks!
> Sridhara
>
>
> On Wed, Feb 9, 2011 at 5:16 PM, Mark Robinson
<mrobinson@wehi.edu.au>wrote:
>
>> Hi Sridhara.
>>
>> On 2011-02-10, at 4:34 AM, Sridhara Gupta Kunjeti wrote:
>>
>> > Hello Mark,
>> > This is in continuation with the normalization of the counts:
>> > did you mean
>> >
>> > (count / library size) * Norm.factor
>> > Can I use the numbers for the library size and Norm.factor can be
used
>> from the edgeR?
>>
>>
>> No. Actually, I mean what I wrote in both previous posts. I'll
repeat
>> again and hopefully third time lucky:
>>
>> rpm <- t(t(d$counts) / (d$samples$lib.size*d$samples$norm.factors))
* 1e6
>>
>> So, this translates to:
>>
>> count / (lib.size*Norm.factor)
>>
>> ... and you may multiply by a factor to put it on a different scale
(e.g.
>> multiply by 1M as I've done above). And, you should remember all
the
>> previous caveats that I've mentioned (i.e. there is no need to do
this for a
>> differential expression analysis as edgeR already builds this in +
this
>> doesn't account for other biases such as gene length).
>>
>> Hope that helps.
>> Mark
>>
>>
>>
>>
>> > Thanks,
>> > Sridhara
>> >
>> >
>> > On Mon, Feb 7, 2011 at 5:11 PM, Mark Robinson
<mrobinson@wehi.edu.au>
>> wrote:
>> > Hi Jens/Sridhara.
>> >
>> > A few thoughts below.
>> >
>> > On 2011-02-07, at 11:22 PM, Sridhara Gupta Kunjeti wrote:
>> >
>> > > Hi Gordon,
>> > > First I would like to thank Jens for asking the questions that
I had
>> asked
>> > > few days ago.
>> > > In additions to the Jens question, I have one more question on
my
>> RNA-seq
>> > > data
>> > > 1. I would like to know if I can multiply the counts for each
gene
>> with the
>> > > norm.factor (calculated by "calcNormFactors( )" function)
>> >
>> >
>> > Sridhara, you've asked this exact question before and I answered
(short
>> answer is: NO to multiplying ... instead, divide by [library
>> size]*[normalization factor]):
>> >
>> >
https://stat.ethz.ch/pipermail/bioconductor/2011-January/037564.html
>> >
https://stat.ethz.ch/pipermail/bioconductor/2011-January/037469.html
>> >
>> > Perhaps you can clarify what you don't understand.
>> >
>> >
>> > > On Mon, Feb 7, 2011 at 5:46 AM, Jens Georg <
>> > > jens.georg@biologie.uni-freiburg.de> wrote:
>> > >
>> > >> Hi Gordon,
>> > >> thank you for your reply. The resolution of our ~100nt solexa
reads
>> is to
>> > >> small to detect individual processing sites, so we want to
>> investigate every
>> > >> single nucleotide individually ("single nucleotide based
>> normalization").
>> > >> That means that we count, how often an individual nucleotide
is
>> covered by
>> > >> sequence reads. Of course, this approach will virtually
increase the
>> > >> lib.size by a factor which depends on length of the solexa
reads. As
>> the
>> > >> lib.size is critical for the normalization, I am not sure if I
should
>> use
>> > >> the original read numbers for each library or the read numbers
>> multiplicated
>> > >> with the read length to adjust for the single nucleotide
>> investigation.
>> >
>> >
>> > So basically, by counting this way, your library size is ~100x
the
>> number of reads you've actually mapped. While I think this will
work out ok
>> (normalization calculation be fine), this coverage calculation does
impose a
>> (strong?) dependence between adjacent nucleotides. One alternative
would be
>> to count the reads that *begin* at a given nucleotide and only
consider
>> these. Then your library sizes are as normal.
>> >
>> >
>> > >> I have two more question regarding to the normalization:
>> > >> 1. Are the norm factors calculated by the calcNormFactors( )
function
>> > >> automatically used for further steps like the
estimateCommonDisp( )
>> > >> function?
>> >
>> > Yes.
>> >
>> >
>> > >> 2. Are the pseudocounts calculated by estimateCommonDisp( )
the
>> normalized
>> > >> readcounts?
>> >
>> > Yes, but this is only accounting for overall depth and potential
>> composition biases, not for length biases (or any others). It is
with the
>> intention of making inferences of a given gene across conditions.
The
>> inferences for differential expression are still done on the raw
counts.
>> >
>> > Hope that helps.
>> > Mark
>> >
>> >
>> >
>> >
>> > >>
>> > >> Many thanks
>> > >>
>> > >> Jens
>> > >>
>> > >> Hi Jens,
>> > >>>
>> > >>> I don't know what you mean by single nucleotide based
normalization,
>> > >>> however the following comments may be helpful.
>> > >>>
>> > >>> edgeR automatically adjusts for library sizes, whether you
include
>> an
>> > >>> explicit normalization step or not. Normalization is a
separate
>> issue, and
>> > >>> is intended to deal with more subtle issues.
>> > >>>
>> > >>> Normalization, as edgeR does it, does not require replicates.
>> > >>>
>> > >>> Best wishes
>> > >>> Gordon
>> > >>>
>> > >>> Date: Fri, 04 Feb 2011 11:28:15 +0100
>> > >>>> From: Jens Georg <jens.georg@biologie.uni-freiburg.de>
>> > >>>> To: bioconductor@r-project.org
>> > >>>> Subject: [BioC] Single nucleotide based RNAseq normalization
with
>> > >>>> edgeR?
>> > >>>> Message-ID: <4D4BD4BF.4010009@biologie.uni-freiburg.de>
>> > >>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Dear edgeR users and developers,
>> > >>>>
>> > >>>> we used Solexa sequencing in order to detect RNase E
processing
>> sites.
>> > >>>> Therefor we splitted a RNA sample and treated one half with
RNase E
>> > >>>> prior to cDNA synthesis and sequencing. The libraries differ
in
>> size
>> > >>>> (1.918.953 and 1.208.586 reads respectively) which clearly
>> necessitates
>> > >>>> a normalization step. Furthermore we expect site specific
>> differences
>> > >>>> rather than differences in the accumulation of the full
length
>> RNAs.
>> > >>>>
>> > >>>> So I want to ask, if it is appropiate to do a single
nucleotide
>> based
>> > >>>> normalization with edgeR and do you think a reliable basic
>> normalization
>> > >>>> is possible without replicates?
>> > >>>>
>> > >>>> Thank you for your comments.
>> > >>>>
>> > >>>> Best regards
>> > >>>>
>> > >>>> Jens
>> > >>>>
>> > >>>
>> > >>>
>>
______________________________________________________________________
>> > >>> The information in this email is confidential and
>> inte...{{dropped:6}}
>> > >>>
>> > >>
>> > >> _______________________________________________
>> > >> Bioconductor mailing list
>> > >> Bioconductor@r-project.org
>> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > >> Search the archives:
>> > >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Sridhara G Kunjeti
>> > > PhD Candidate
>> > > University of Delaware
>> > > Department of Plant and Soil Science
>> > > email- sridhara@udel.edu
>> > > Ph: 832-566-0011
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > _______________________________________________
>> > > Bioconductor mailing list
>> > > Bioconductor@r-project.org
>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> > ------------------------------
>> > Mark Robinson, PhD (Melb)
>> > Epigenetics Laboratory, Garvan
>> > Bioinformatics Division, WEHI
>> > e: mrobinson@wehi.edu.au
>> > e: m.robinson@garvan.org.au
>> > p: +61 (0)3 9345 2628
>> > f: +61 (0)3 9347 0852
>> > ------------------------------
>> >
>> >
>> >
______________________________________________________________________
>> > The information in this email is confidential and intended solely
for
>> the addressee.
>> > You must not disclose, forward, print or use it without the
permission
>> of the sender.
>> >
______________________________________________________________________
>> >
>> >
>> >
>> > --
>> > Sridhara G Kunjeti
>> > PhD Candidate
>> > University of Delaware
>> > Department of Plant and Soil Science
>> > email- sridhara@udel.edu
>> > Ph: 832-566-0011
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: mrobinson@wehi.edu.au
>> e: m.robinson@garvan.org.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>>
______________________________________________________________________
>>
>
>
>
> --
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara@udel.edu
> Ph: 832-566-0011
>
--
Sridhara G Kunjeti
PhD Candidate
University of Delaware
Department of Plant and Soil Science
email- sridhara@udel.edu
Ph: 832-566-0011
[[alternative HTML version deleted]]
Hi Jens,
Il Feb/7/11 11:46 AM, Jens Georg ha scritto:
> Hi Gordon,
> thank you for your reply. The resolution of our ~100nt solexa reads
is
> to small to detect individual processing sites, so we want to
> investigate every single nucleotide individually ("single nucleotide
> based normalization"). That means that we count, how often an
individual
> nucleotide is covered by sequence reads. Of course, this approach
will
> virtually increase the lib.size by a factor which depends on length
of
> the solexa reads. As the lib.size is critical for the normalization,
I
> am not sure if I should use the original read numbers for each
library
> or the read numbers multiplicated with the read length to adjust for
the
> single nucleotide investigation.
Do you have reasons to assume that these options are not essentially
equivalent, ie. that the read length distributions are different in
different lanes?
If that were the case, probably more thought is required on what
underlying uncontrolled physical/chemical/biological effect causes
this,
and derive a suitable 'normalisation' approach from that.
Best wishes
Wolfgang
>
> I have two more question regarding to the normalization:
> 1. Are the norm factors calculated by the calcNormFactors( )
function
> automatically used for further steps like the estimateCommonDisp( )
> function?
> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
> normalized readcounts?
>
> Many thanks
>
> Jens
>
>> Hi Jens,
>>
>> I don't know what you mean by single nucleotide based
normalization,
>> however the following comments may be helpful.
>>
>> edgeR automatically adjusts for library sizes, whether you include
an
>> explicit normalization step or not. Normalization is a separate
issue,
>> and is intended to deal with more subtle issues.
>>
>> Normalization, as edgeR does it, does not require replicates.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Fri, 04 Feb 2011 11:28:15 +0100
>>> From: Jens Georg <jens.georg at="" biologie.uni-freiburg.de="">
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] Single nucleotide based RNAseq normalization with
>>> edgeR?
>>> Message-ID: <4D4BD4BF.4010009 at biologie.uni-freiburg.de>
>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>>>
>>>
>>>
>>> Dear edgeR users and developers,
>>>
>>> we used Solexa sequencing in order to detect RNase E processing
sites.
>>> Therefor we splitted a RNA sample and treated one half with RNase
E
>>> prior to cDNA synthesis and sequencing. The libraries differ in
size
>>> (1.918.953 and 1.208.586 reads respectively) which clearly
necessitates
>>> a normalization step. Furthermore we expect site specific
differences
>>> rather than differences in the accumulation of the full length
RNAs.
>>>
>>> So I want to ask, if it is appropiate to do a single nucleotide
based
>>> normalization with edgeR and do you think a reliable basic
normalization
>>> is possible without replicates?
>>>
>>> Thank you for your comments.
>>>
>>> Best regards
>>>
>>> Jens
>>
>>
______________________________________________________________________
>> The information in this email is confidential and
inte...{{dropped:6}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber